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Abstract 


;i 


A wide  range  of  segmentation  techniques  continues  to  evolve  in  the  I 

literature  on  scene  analysis.  Many  of  these  approaches  have  been  con-  I 

strained  to  limited  applications  or  goals.  This  survey  analyzes  the 
complexities  encountered  in  applying  these  techniques  to  color  images 

of  natural  scenes  Involving  complex  textured  objects.  It  also  explores  ( 

1 

new  ways  of  using  the  techniques  to  overcome  some  of  the  problems  which  j 

are  described.  An  outline  of  considerations  in  the  development  of  a j 

general  image  segmentation  system  which  can  provide  input  to  a semantic 
interpretation  process  is  distributed  throughout  the  paper. 

Tn  particular,  the  problems  of  feature  selection  and  extraction  ! 

in  images  with  textural  variations  are  discussed.  The  approaclics  to  | 

segmentation  are  divided  into  two  broad  categories,  boundary  formation 
and  region  formation.  The  tools  for  extraction  of  boundaries  Involve 
spatial  differentiation,  non-maxima  suppression,  relaxation  processes, 
and  groviping  of  local  edges  into  segments.  Approaches  to  region  formation 
include  region  growing  under  local  spatial  guidance,  liistograms  for 
analysis  of  global  feature  activity,  and  finally  an  integration  of  the 
strengths  of  eacli  by  a spatial  analysis  of  feature  activity.  A brief 
<iiscussion  of  attempts  by  others  to  integrate  the  segmentation  and  interpretation 
phases  is  also  provided.  The  discussion  is  supported  by  a variety  of  . 

experimental  results. 
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Xj,_  Introduction 

In  the  design  of  a general  computer  vision  system  for  interpreting 
Images,  one  must  face  many  of  the  Issues  confronting  the  development  of 
complex  AI  systems  in  general.  Image  understanding  requires  the  proces- 
sing of  vast  quantities  of  sensory  data,  with  noise  from  the  sensing 
mechanisms  as  well  as  non-semantic  information  obscuring  the  semantically 
significant  entitles  that  are  to  be  perceived.  One  must  organize  both 
processes  and  knowledge  structures  in  a modular  fashion  to  interact  In 
a flexible  manner  (Hanson  & Riseman  [1976],  Arbib  & Riseman  [1976]). 

Ine  complexities  in  the  design  and  implementation  of  such  systems 
typically  has  led  to  a decomposition  of  the  problem  into  distinct  sub- 
systems for  segmentation  and  interpretation,  often  referred  to  as  'low- 
level'  and  'high-level'  processing,  respectively.  We  view  the  goal  of 
the  initial  stages  of  processing  in  visual  systems  as  segmentation, 
a transformation  of  the  data  into  a partitioned  image  with  parts  in  a 
representation  which  is  more  amenable  to  the  semantic  processing. 

The  general  problems  of  segment,  tion  Involve  processing  arrays  of  numeric 
values  representing  brightness  (and  color)  in  order  to  extract  features 
of  boundaries  and  regions  over  local  areas  or  'windows'.  By  a variety 
of  means  this  information  can  be  aggregated,  labelled  with  symbolic  names 
and  attributes,  and  then  interfaced  to  knowledge  structures  by  interpre- 
tation processes. 

There  lias  been  some  debate  over  the  degree  to  which  semantic  infor- 
mation should  be  employed  in  the  partitioning  of  an  image.  The  problem 
of  segmenting  scenes  with  textural  variations  is  rather  challenging, 
and  it  is  clear  that  the  context  of  local  data  in  a picture  Influences 
our  Interpretation  of  that  data.  Then  it  is  reasonable  to  ask  why  the 


Image  should  not  be  processed  Immediately  with  knowledge  of  'chairs', 
'tables',  or  any  other  objects  expected  in  the  image.  This  will  be 
discussed  in  more  detail  later  in  this  paper,  but  it  is  worthwhile  to  make 
our  views  on  this  matter  clear  now. 

A vision  system  which  is  to  operate  in  a constrained  domain  with 
constrained  goals  will  be  able  to  use  such  knowledge  to  its  advantage. 
However,  this  means  that  the  segmentation  operations  cannot  be  applied 
to  a new  domain  without  providing  the  new  knowledge  for  that  domain;  in 
each  case  the  content,  form,  and  manner  of  use  of  the  domain-dependent 
knowledge  must  be  specified.  This  also  might  Involve  serious  computa- 
tional considerations  depending  on  the  amount  of  knowledge  and  its  use. 
This  implies  a reconstruction  and  evaluation  of  the  segmentation  system 
in  each  new  application.  It  seems  to  us  that  there  is  a large  degree  of 
non-semantlc  patterns  of  sensory  visual  data  which  can  allow  effective, 
although  not  perfect,  initial  segmentation  without  recourse  to  semantics. 

A similar  view  has  been  expressed  by  Zucker,  Rosenfeld  and  Davis  [19751. 
The  human  visual  system  can  do  quite  well  in  partitioning  nonsense  images, 
even  when  neighboring  regions  are  highly  textured. 

For  these  reasons  we  view  the  problem  of  image  understanding  as  one 
of  performing  initial  segmentation  via  general  procedures,  feeding  this 
low-level  output  to  a high-level  system,  and  then  allowing  feedback  loops 
so  that  the  interpretation  processes  can  Influence  refined  segmentation. 
This  allows  semantic  information  to  influence  segmentation  in  a goal 
oriented  way  without  coupling  all  such  knowledge  directly  into  the  low- 
level  processes.  In  this  paper,  however,  we  will  look  primarily  at 
computer  techniques  for  a one-way  transformation  from  'raw'  visual  input 


of  static  Images  to  a segmented  array. 


2 a 

From  this  point  of  view,  the  segmentation  processes  provide  a compact 
description  of  the  location  and  characteristics  of  visually  distinct 
areas  of  the  image.  However,  the  local  analyses  may  generate  a great  deal 
of  spurious  activity  because  objects  in  images  do  not  appear  as  uniformly 
colored  areas  (as  in  cartoon  drawings)  but  rather  have  natural  textural 
variations,  reflectance,  shadows,  etc.  Thus,  the  integration  of  local 
processing  into  globally  consistent  boundaries  and  regions  is  not  at 
all  straightforward. 

From  a classic  AT  point  of  view,  this  analysis  involves  an  enormous 
search  space.  If  one  adopts  the  ideal  goal  of  bringing  togetlier  these 
local  representations  of  data  into  an  optimal  global  representation,  one 
must  immediately  face  the  combinatorics  of  the  problem  and  the  (piest  ion 
of  computational  efficiency.  Global  brute  force  search  is  quite  impossible, 
and  of  cotirse  one  would  not  even  recognize  acceptable  solutions  without 
the  application  of  higher-level  processes  to  each  alternative.  Humans  can 
understand  Images  of  natural  scenes  even  in  the  presence  of  a high  degree 
of  noise  and  local  textural  variations.  Clearly,  the  different  phases 
of  processing  that  are  employed  must  be  integrated  and  techniques  to 
constrain  the  alternatives  within  each  are  necessary.  Interaction  between 
the  analyses  of  local  visual  areas  can  be  employed,  but  there  must  be 
provision  for  global  guidance;  not  all  possible  global  boundaries  can 
be  considered,  but  local  noise  in  the  formation  of  a long  straight  line 
should  be  handled  by  the  global  view  of  the  line.  In  this  paper  we  will 
examine  some  of  the  ways  of  dealing  with  these  problems. 

In  tlie  next  twi>  sections  of  the  paper,  we  examine  featuri'  extraction, 
color,  and  texture.  The  main  focus  of  this  paper,  techniques  for  boundary 
formation  and  for  region  formation,  are  presented  in  the  next  two  sections, 
with  a concluding  discussion  in  the  last  section. 


2._  Feature  Extraction 


2 . 1 Raw  Iiyjut  and  Color 


Firstly,  then,  what  is  the  'raw'  visual  input?  In  an  animal,  it  is 
simply  the  pattern  of  light  (distributed  across  the  spectrum)  falling 
on  the  animal's  retinas.  This  pattern  changes  over  time  as  the  animal 
moves  and  the  environment  changes.  In  a computer  visual  system,  the 
input  may  be  far  more  restricted.  The  simplest  input  is  a black-and-white 
photograph  which  provides  a two-dimensional  map  of  light  Intensity  in  a 
static  scene.  Such  an  input  can  be  subject  to  boundary  formation  and 
texture  analysis.  In  this  paper,  we  shall  provide  computer  techniques 
for  analyzing  a static  scene  enriched  by  color.  The  usual  way  of 
representing  a color  photograph  is  by  coding  it  as  three  arrrays,  each 
sampling  the  brightness  of  the  pattern  through  a different  standard 
filter.  Usually,  the  peak  frequencies  of  the  filters  correspond  to  the 
tliree  primary  colors  of  red,  green,  and  blue.  This  is  true  of  the  eye 
as  well  as  of  the  computer:  each  rod  In  the  retina  has  peak  receptivity 
near  the  frequency  of  one  of  the  three  primary  colors. 


Figure  1 depicts  a simple  house  scene  viewed  through  each  of  the 
three  filters  and  also  averaged  Into  a black  and  white  (B  & W)  mono- 
chromatic image.  If  one  views  the  blue  component  (Fig.  Ic)  of  the 
colored  image  as  a black-and-white  photo,  then  bright  regions  are  those 
with  a strong  blue  component.  Since  white  light  has  all  spectral  com- 
ponents, both  blue  sky  and  white  clouds  may  appear  indistinguishable  in 
the  blue  image.  However,  the  red  and  green  components  of  the  image  will 
portray  the  boundary  between  sky  and  clouds; 

The  color  of  the  sky  is  actually  cyan  (greenish  blue)  which 

has  a much  larger  green  contribution  than  red.  Consequently  the  red 

component  (Fig.  la)  of  the  image  would  show  the  sky  area  to  be  much 

darker  than  the  cloud  area,  while  the  contrast  is  not  as  great  in  the 

green  (Fig.  lb)  component.  By  properly  viewing  the  three  images  one  can 

estimate  the  colors  of  other  areas,  e.g.,  the  roof  and  unshadowed  side 

of  the  house  is  reddish,  tlie  grass  is  yellowish  green  (high  in  green, 

$ 

moderate  red),  the  house  trim  is  white  (high  in  all  components),  etc. 

Consequently,  even  the  roughest  sense  of  the  color  of  an  object 
cannot  be  determined  without  looking  at  all  three  values.  On  a dark  to 
light  gray  scale  from  0 to  63,  a red  value  of  40  could  represent: 

1)  a pure  red  (if  the  other  components  are  0);  or 

2)  yellow  (if  the  green  value  is  40,  and  blue  is  0);  or 

3)  white  (If  both  green  and  blue  are  also  40);  etc. 


i 
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2 . 2 Hue,  Saturation  and  Intensity 

Due  to  these  problems,  the  raw  data  Is  often  transformed  Into  a 
different  coordinate  system  which  Is  more  Intuitive  to  the  human  user: 
hue,  saturation  and  Intensity,  often  referred  to  in  this  paper  as  HSI 
features  or  parameters.  The  following  is  a brief  discussion  of  the 
definition  of  these  features. 

The  Information  associated  with  each  point  (i,j)  can  be  viewed  as 
a vector  in  3 space,  [R(i,j),  G(i,j),  B(i,j)].  We  restrict  each  element 
of  the  vector  to  the  range  [0,63].  To  help  the  reader  visualize  this, 
we  adapt  the  clear  and  simple  presentation  provided  by  Schacter,  Davis 
and  Rosenfeld  [1975],  and  view  this  as  a vector  within  the  63  x 63  x 63 
cube  depicted  in  Figure  2a. 

By  viewing  the  brightness  of  a point  as  an  average  of  the  three 
primary  color  components,  it  is  clear  that  the  origin  [0,0,0]  is  black 
and  maximum  brightness  [63,63,63]  is  white.  We  may  define  a gray  scale  of 
brightness  or  'intensity'  by  , 

, • R+B+G 

' ‘-T-  ■ 

This  is  equivalent  to  the  length  of  the  projection  of  the  vector  [R,G,B] 
associated  with  any  point  upon  the  diagonal  vector  shown  in  Figure  2a. 

Thus,  points  in  the  color  cube  get  progressively  brighter  as  one  moves 
from  the  bottom  right  to  the  upper  left  corners. 

Other  colors  are  obtained  as  one  combines  the  primary  colors  in  different  amounts. 

The  corners  of  the  color  cube  are  labelled  in  Figure  2b  with  the  names  of  perceived 
colors  which  are  formed  from  the  three  primary  colors.  For  example,  red  and  green 
In  equal  amounts  produce  yellow,  when  the  blue  component  is  0.  Thus,  one  can 
imagine  the  right  face  of  the  cube  in  Figure  2b  varying  across  green, 
yellowish-green,  yellow,  yellowish-red  (orange),  and  red.  A diagonal 


f’igurp  1:  Digitized  images  of  a natural  color  scene. 

(a)  Red,  (b)  Green,  and  (c)  Blue  components  are  sliown. 

(d)  Intensity  (or  brightness)  is  an  average  of  the  first 
three  images;  subareas  ^ and  B will  be  used  in  examples  later 


in  the  paper. 
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line  from  black  to  yellow  (l.e.,  red  and  green  components  are  equal) 
will  represent  yellow  at  different  levels  of  intensity^.  Now  we 
need  a way  of  describing  the  points  inside  the  cube  as  well  as  on  the 
surface  of  the  cube.  The  points  in  a plane  perpendicular  to  the  gray- 
scale vector  from  black  to  white  are  of  equal  intensity.  The  largest 
such  plane  within  the  unit  cube  is  the  plane  passing  through  the  cube 
at  the  corners  R,  G and  B,  forming  the  equilateral  triangle  depicted 
In  Fig.  2c  and  2d.  At  any  other  level  of  intensity  this  triangle  is 
smaller.  The  implication  is  that  there  is  a smaller  range  of  color 
combinations  that  can  be  formed  as  one  approaches  minimum  and  maximum 
intensity  (black  and  white). 

The  color  triangle  of  Fig.  2c  can  now  be  used  to  describe  two  other 
characteristics  of  color  space,  hue  and  saturation,  which  are  independent 
of  Intensity.  The  intersection  of  the  color  triangle  with  the  line  between  the  origin 
and  any  point  P in  the  color  cube  defines  the  projection  of  P onto  the  color 
triangle  at  'P' . The  placement  of  this  point  P'  is  defined  by  normalizing 
the  values  of  R,  G and  B: 

R 

^ “ R+G+B 
® ° R+G+B 

^ The  problem  is  much  more  complicated  from  a psychological  view  because 
our  perception  of  the  color  yellow  is  also  a function  of  intensity  and 
below  some  threshold,  we  migl\t  call  it  another  color  such  as  tan,  brown, 
blacklsli-brown , or  black.  Human  perception  of  color  is  a very  complicated 
process  and  we  will  not  be  able  to  treat  this  problem  in  detail.  The 
reader  is  referred  to  Evans  [1948],  Cornsweet  (1970),  Bouma  (1971)  and 
Beck  [1975], 
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which  Implies  that  r + g + b = 1.  Since  there  are  only  two  Independent 
variables,  It  is  convenient  to  convert  the  equilateral  color  triangle  into 
a right  color  triangle  with  the  point  P'  defined  by  r and  g (on  the  R and 
G axes,  respectively)  as  shown  in  Fig,  2e. 

Now  one  can  specify  the  hue  and  saturation  of  point  P'.  Intuitively, 
hue  can  be  thought  of  as  representing  the  tvpe  of  color.  Saturation  is  a 
measure  of  the  richness  or  purity  of  the  color  and  is  inversely  proper- 

O 

tional  to  the  amount  of  white  light  diluting  the  hue.  . Both  of  the  colors 
pink  and  scarlet  may  have  the  same  hue,  but  pink  is  unsaturated  while 
scarlet  is  highly  saturated.  If  one  represents  the  center  of  the  color 
triangle  as  W (where  W is  the  neutral  point  representing  the  projection  of 
white  and  all  gray  levels  between  white  and  black) , then  the  extension  of 
the  line  between  W and  P'  to  the  perimeter  of  the  triangle  describes  the 
hue  of  p';  it  is  denoted  by  H in  Fig.  2e.  There  is  a one-one  mapping 
between  points  on  the  perimeter  of  the  color  traingle  and  the  angular 
orientation  0-  with  respect  to  an  arbitrary  reference  point,  in  this 
case  R.  Thus,  hue  can  be  represented  as  an  angle  O with  red  as  0°, 
green  as  120°,  and  blue  as  240°. 

Saturation  of  P'  is  computed  as  a percentage  of  the  distance  of  P' 
from  W to  the  perimeter  point  H: 

S • 

1h-w| 

If  P*  is  anywhere  on  the  perimeter  of  the  color  triangle,  then  it  has  a 
saturation  of  100%  while  the  point  W (white)  is  completely  diluted  and  has 

O 

a saturation  of  0%. 

The  HSI  features  that  we  have  defined  are  not  entirely  independent. 

If  one  examines  the  diagrams  In  Fig,  2,  one  can  see  that  totally  saturated 
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Figure  2:  Transformation  of  the  raw  data  (R,G,B)  into  parameters  of 

hue,  saturation,  and  intensity  (H,S,I). 

(a)  The  color  cube  and  (b)  the  names  of  colors  at  the  corners, 
(c)  Formation  of  the  color  triangle,  (d)  Projection  of  a point 
p'  on  the  color  triangle.  (e)  Only  two  parameters  of  (r,g,b) 
are  Independent,  producing  the  right  color  triangle;  H and 


S are  shown  in  this  representation. 
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yellow,  cyan  or  magenta  can  have  an  intensity  twice  as  great  as  the  highest  intensity 
red,  hlue  or  green  which  still  remains  totally  saturated.  Certain  colors  may  only  be 
perceived  within  a range  of  one  of  the  parameters;  e.g.,  yellow  is  seen  as  brown  only 
when  the  intensity  is  low.  Thus,  if  a mapping  from  HSI  into  tlie  symbolic  color  names 
is  desired,  one  must  take  into  account  dependencies  between  the  HSI  parameters.  For 
other  treatments  of  color  see  Tenenbaum  et  al . [1974]  and  Sloan  and  Bajesy  (19751. 

Recently  Render  [1976]  has  addressed  a problem  that  some  have  known  about,  but  has 
not  been  discussed  in  the  literature.  In  the  transformation  to  normalized  components 
or  HSI,  there  are  points  of  instability  where  arbitrarily  small  changes  in  R,  0,  B 
will  produce  large  differences  in  the  transformed  components;  e.g.,  near  point  W, 
small  changes  in  the  raw  components  can  cause  very  large  changes  in  hue  and  satura- 
tion. Render's  treatment  is  a very  thorough  numerical  analysis  of  the  computation 
and  use  of  color,  but  is  beyond  the  scope  of  this  paper. 

Most  of  the  information  with  respect  to  boundaries  seems  to  be  visible  in  the 
B 6 W intensity  array  of  Fig.  Id.  Thus,  one  can  avoid  the  problems  of  color  if  one 
is  willing  to  risk  the  disappearance  of  boundaries  between  areas  of  distinct  color 
but  similar  intensity.  We  believe  that  color  information  is  extremely  useful  for 
interpretation  and  despite  the  potential  problems  will  continue  to  refer  to  the  HSI 
features  throughout  this  paper, 

2 . 1 Ex trac t ing  Other  Features  Over  Windows  of  Variable  Size 

The  major  complexity  that  arises  in  segmentation  is  that  the  areas  to  be  partitlon(?d 
still  are  usually  not  invariant  across  the  primitive  parameters  of  hue,  saturation, 
and  intensity  (HSI).  These  problems  are  intertwined  in  the  complexities  of  texture 
which  will  be  treated  later  in  this  paper.  Wliat  we  now  stress  is  that  even  when 
scene  analysts  works  with  static  color  input,  the  features  upon  which  segmentation 
algorithms  operate  need  not  be  restricted  to  the  HSI  values  associated  with  image 
points.  For  example,  an  algorithm  for  boundary  detection  may  only  produce  the 
correct  results  if  It  operates  on  some  of  the  average  HSI  parameters  computed 


I 


across  a local  window  of  the  right  size;  the  proper  boundary  raay  only  he 
obvious  to  local  operators  after  some  degree  of  blurring  (which  provides  a 
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more  global  viewpoint).  Thus,  one  is 
windows  of  differing  sizes  as  well  as 


faced  with  analyzing  features  of  local 
of  individual  points  at  the  resolution 


level  of  the  image. 

Once  the  constraints  on  what  constitutes  a feature  are  relaxed  in 
this  manner,  a huge  class  of  possibly  important  properties  becomes 
available.  The  meaningful  feature  might  actually  be  the  variance  of  a 
property  over  a local  area,  not  just  the  average  of  that  property.  This 
provides  a measure  of  invariance  or  homogeneity  of  a given  property. 

If  texture  elements  are  extracted  as  atomic  areas  which  are  homogeneous 
In  one  or  more  of  the  HSI  parameters,  then  the  shape,  size,  and  orienta- 
tion of  these  areas  might  be  the  crucial  property  forming  the  cohesiveness 


of  the  perceived  region.  Although  we  shall  not  attempt  to  discuss  the 


extent  of  the  many  efforts  at  feature  extraction,  properties  for  which 


computational  procedures  have  been  developed  include:  average  of  an 
area  (blurring),  average  edge  per  unit  area  (spatial  differentiation  and 
then  blurriag) , average  orientation  of  local  edges  and  average  spot  size  of 
uniform  contiguous  area.  All  of  these  techniques  have  been  carefully 
explored  by  Rosenfeld's  group  (Rosenfeld  et  ali  [1970,71,72])  and  are 
treated  by  Rosenfeld  and  Kak  [1976].  Bajcsy  [1973]  has  used  frequency 
distribution  in  the  Fourier  domain  in  the  analysis  of  texture  gradients. 


The  computational  games  that  are  available  for  constructing  more 
complex  features  by  combining  these  techniques  seem  endless.  Let  us 
consider  a sequence  of  operators  to  determine  the  orientation  of  line 


elements  a;>  the  textural  property  characterizing  a region.  One  might 
first  cempute  a series  of  directional  derivatives  of  the  Image  in  color 
space  to  determine  the  strength  of  color  differences  at  various  orientations; 
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Chen,  in  a blurring  process,  average  these  values  over  a local  window  of 
some  size  to  delimit  the  areas  which  contain  the  lines  (l.e.,  average  edge/ 
unit  area);  and  finally  differentiate  these  values  for  each  orientation. 

The  result  at  a particular  point  represents  the  strength  of  a boundary 
between  areas  on  either  side  of  it,  where  the  values  of  these  areas  are 
based  on  the  property  of  average  strength  of  edges  in  the  particular  orien- 
tation. If  the  first  two  steps  are  replaced  by  a function  which  computes 
the  size  of  atomic  areas  and  then  averages  these  sizes,  the  differentiation 

might  discriminate  between  textures  of  different  coarseness. 

The  Important  point  to  remember  is  that  many  of  the  algorithms  discus- 
sed can  work  upon  any  array  of  extracted  features,  not  just  the  simple 
examples  presented.  The  problem  is  further  complicated  by  the  choice  of 
applying  algorithms  to  vectors  of  parameters.  A spatial  differentiation 
operator  might  be  applied  to  the  intensity  array  of  a static  scene  to  find 
large  changes  in  brightness  or  to  all  three  of  the  RGB  or  HSI  parameters  as  a 
three-dimensional  vector.  The  metric  Is  often  defined  in  one-dimensional 
and  three-dimensional  space,  but  in  general  can  be  applied  in  n-dimensional 
space  (if  n features  have  been  extracted). 

Given  Che  state-of-the-art  in  scene  analysis,  one  is  faced  with  a 
combinatoric  explosion  of  alternatives — experience  has  not  yet  provided 
answers  to  this  problem.  It  is  probable  that  working  systems  will  require 
the  ability  to  determine  dynamically  the  proper  size  and  membership  of  the 
subset  of  features  employed  by  the  algorithms.  It  must  be  stressed  that  many 
scene  analysis  systems  will  be  tailored  for  specific  applications — be  they 
assembly  lines,  cardiovascular  data,  chromosome  analysis  or  satelite  imagery. 
Much  of  the  success  of  such  a system  will  depend  on  the  judicious  choice  of 
those  local  features  most  likely  to  speed  up  the  segmentation  of  the  res- 

e 

trlcted  class  of  images  presented  by  the  problem  domain. 


3 Se^jnentatlon  and^  Texture 


3 . 1 Coals  of  J^egmen t a t ioji 

We  shall  distinguish  two  main  approaches  to  the  segmentation  of 
natural  scenes: 

a)  Boundary  Formation  - finding  the  boundaries  which  delimit  a 
region;  and 

b)  Region  Formation  - analyzing  properties  of  areas  to  merge  or 
split  them  into  regions. 

The  goals  of  these  two  types  of  analysts  are  equivalent — they  both  form 
a partition  of  the  scene  into  regions  and  boundaries.  They  both  must 
employ  some  type  of  grouping,  clustering,  or  binding  of  local  areas/edges 
together.  But  the  focus  of  the  first  is  upon  differences  (discontinuity) 
in  properties  while  the  second  is  upon  similarities  of  properties.  It  is 
quite  possible  that  specific  examples  of  these  approaches  could  produce 
consistent  or  even  Isomorphic  results.  Placements  of  boundaries  in  one 
representation  might  be  exactly  between  the  regions  formed  in  another 
representation.  However,  in  practice  algorithms  which  operate  upon 
arrays  of  numbers  representing  complex  visual  information  end  up  taking 
many  different  forms  in  dealing  with  the  problems  to  be  described. 

The  data  are  often  manipulated  differently  depending  on  whether  one  tries 
to  form  lines  or  extract  properties  of  areas.  A scheme  which  is  tracking 
edges  would  be  able  to  use  the  expected  straightness  of  a boundary  during 
the  processing,  while  the  region  approach  might  collect  distributed 
characteristics  of  widely  separated  local  areas. 

A powerful  scene  analysis  system  will  make  cooperative  use  of 

several  such  processes  in  handling  all  but  the  most  sharply  differentiated 

of  regions  (Arblb  and  Rlseman  fl976)). 


Before  we  discuss  particular  segmentation  techniques,  let  us  look 
again  at  the  scene  depicted  in  Figure  1 and  note  distinguishing  characteris- 
tics of  the  parts  of  the  image  that  we  would  hope  to  extract  as  regions. 

The  sky  and  clouds  are  relatively  distinct  homogeneous  regions.  It  turns 
out  to  be  easy  to  segment  the  main  area  of  sky  from  the  rest  of  the  scene 
on  the  basis  of  Intensity.  The  grass,  only  slightly  more  difficult  since 
it  has  a rather  homogeneous  fine  texture,  becomes  distinct  from  the 
surrounding  areas  on  the  basis  of  'average'  hue  or  intensity  in  a blurring 
process.  Of  course  the  area  in  shadow  is  separated  sharply  from  the  rest 
of  the  grass  on  the  basis  of  intensity,  but  it  turns  out  that  there  is  only 
a slight  shift  in  hue^.  This  means  that  there  is  information  available 
during  segmentation  either  to  form  the  shadowed  grass  area  separately  or  to 
bind  it  to  the  unshaded  grass  area.  In  these  examples,  it  appears 
that  a conservative  strategy  which  forms  separate  regions  might  be  better  since 
tliere  is  information  available  to  merge  tlie  regions  with  more  confidence  later 
under  semantic  guidance.  If  these  regions  are  merged  Immediately,  then  problems 
of  backtracking  must  be  faced.  The  primitive  regions  which  have  been  formed  will 
need  to  be  examined  later  to  see  whether  they  should  be  partitioned  in  an  alternate 
way . 

The  more  difficult  areas  in  the  scene  of  Figure  1 are  the  maze  of  textural 
variations  in  the  tree,  tlie  smaller  areas  of  detail  which  are  not  clearly  defined 
in  I lie  bouse  and  shrubs,  and  the  areas  running  off  into  shadows.  The  left  window 


In  general,  one  cannot  expect  the  hue  of  a shadowed  area  to  remain  un- 
changed. In  fact  if  the  light  is  reduced  significantly,  the  hue  will 
be  quite  prone  to  error  (Render  [1976])  as  it  approaches  black  througli 
the  lower  intensities. 
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area  is  partly  occluded  by  the  leaves  and  branches,  so  that  the  distinct  portions 
of  the  window  trim  and  panes  do  not  form  areas  wliich  are  easy  to  join  togetlier 
and  interpret  in  the  absence  of  context.  This  problem  in  the  windowarea  is 
compounded  bv  reflections  and  shadows,  e.jj.,  the  light  areas  of  the  window  pane  in  tht- 
intensity  Image  of  Figure  Id  are  blue  reflections  from  the  sky. 

One'  can  now  appreciate  the  difficulty  of  purely  low-level  formation  of 
a single  region  covering  the  whole  tree — both  the  area  of  tree  with  sky 
showing  through,  as  well  as  the  area  of  tree  with  obscured  house  in  the 
background.  The  background  textural  elements  are  quite  different  in  these 
two  areas,  yet  there  are  common  textural  qualities  which  form  one  part 
of  this  macrotexture  (the  leaves  and  branches)  in  each  case.  This,  and 
the  fact  that  texture  elements  in  the  two  areas  are  connected,  are  crucial 
clues  which  can  be  used  to  hypothesize  the  joining  of  these  two  regions.' 

).2  I’roblems  jnid  Coals  in  Processing  Texture 

2 

The  major  problem  for  all  segmentation  techniques  is  texture. 

We  use  ttte  term  texture  rather  loosely  to  encompass  the  variations  In 

the  visual  properties  of  objects/surfaces/regions,  including  the  texture  induced 

by  reflections  from  an  irregular  surface  (e.g.,  highlights  in  the  crown 

'once  again,  cooperation  of  high-level  processes  which  know  something  about 
background  .areas  showing  through  objects  can  l>e  used  to  remove  any  remain- 
ing, .ambiguity.  Tliis  reiterates  the  importance  o1  our  observation  that  liigli- 
levet  systems  ouglit  to  .affect  segment  .at  ion  .at  scmie  point  in  the  processing. 

2 

“Mere  we  are  rel.'rring  to  the  primarv  difficulty  in  p.t  r t i t i on  i ng  .a  scene  into 
distinct  vi.sial  ct'mponents,  not  the  m.ajor  go.al  ol  determining  the  sem.antic 
rel.at  ionsh  ips  Iietwoen  the  pictori.al  components.  I. .iter,  we  briefly  discuss 
some  atti-T'pts  to  integrate  semantics  with  t tu'  segmentation  process. 
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of  a tree)  and  by  occlusion  of  the  light  source  (e,g.,  shadows  in  the 
crown  of  the  tree  due  to  branches  blocking  the  light  source).  The  areas 
to  be  partitioned  are  rarely  uniform  in  any  of  the  simple  parameters  of 
hue,  saturation  and  intensity. 

Segmentation  processes  are  always  faced  with  the  difficulty  of  dis- 
tinguishing between  a region  covered  by  texture  elements  and  the  texture 
element  Itself;  the  system  might  be  mistakenly  focussing  upon  the  internal 
structure  of  a region.  The  proper  area  varies  as  a function  of  resolution, 
focus  of  attention,  and  goal  orientation— is  one  attempting  to  bound  the 
leaf,  the  branch,  the  clump  of  leaves,  the  tree  from  other  trees? 

Many  studies  have  been  conducted  on  images  containing  at  most  two 
textures  or  the  simpler  problem  of  classifying  an  image  of  a single 
texture.  The  problems  that  appear  when  one  requires  a single  process 
to  deal  with  arbitrary  texture  types  in  regions  of  varying  size,  quality, 
and  placement  have  not  been  explored  in  the  literature.  Textures  can 
occur  as  a recursive  embedding  of  texture  types  to  make  the  task  even  more 
difficult.  Faced  with  a combinatoric  explosion  of  possibilities,  research- 
ers have  correctly  chosen  to  deal  with  restricted  classes  of  textures. 
However,  the  set  of  tools  that  have  been  developed  might  become  more 
effective  when  a system  can  employ  them  in  some  general  but  structured 
manner.  It  appears  that  the  time  is  ripe  for  an  integrative  attack  upon 
the  complexltle's  of  visual  texture. 

There  are  three  common  goals  in  texture  analysis: 

a)  classification  of  texture  into  a set  of  categories; 

b)  description  of  texture  in  terms  of  primitive  properties;  and 

c)  segmentation  of  texture. 
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in  the  first  two  cases,  one  usually  assumes  that  the  given  sample  is  an 
example  of  a single  texture.  Because  this  is  not  true  in  the  third  case, 
techniques  for  categorization  of  a single  given  texture  or  formation  of 
its  description  are  not  sufficient  for  the  determination  of  a boundarv 
between  two  areas  (of  unknown  sizes  and  shapes)  but  with  distinct  textures. 

^.3  Hierarchical  Approaches  to  Texture  Analysis 

One  of  the  main  problems  in  segmentation  of  textured  regions  is 
that  the  textural  feature  whose  difference  is  to  characterize  the  boundary 
may  need  to  be  extracted  over  a local  area  of  unknown  size  and  shape. 

If  the  information  is  sampled  over  areas  that  are  not  largo  witli  respect 
to  texture  elements  or  variations,  then  one  cannot  expect  these  local 
analyses  to  provide  feature  values  that  are  invariant  across  the  textured 
region.  Consequently,  it  is  desirable  to  extract  the  textural  information 
over  as  large  an  area  as  possible.  However,  this  leads  to  the  'window 
problem' — one  cannot  be  sure  of  when  the  window  area  over  which  the  feature 
is  extracted  is  entirely  placed  inside  a region  or  when  it  is  extracting 
a 'mutant'  value  (i.e.,  confusing  a mixture  of  two  textures  as  a single 
new  texture)  because  it  overlaps  regions.^ 

A general  segmentation  system  will  need  the  ability  to  extract  such 
information  over  varying  window  sizes.  The  selection  of  the  proper 
size  for  the  'receptive  field'  must  surelv  be  a dynamic  decision  (and 
sometimes  could  be  provided  by  feedback  from  the  Interpretive  process). 

'i'his  problem  is  related  to  the  'mixed  pixel'  problem.  When  an  image  is 
first  scanned,  the  pixel  could  be  on  the  boiiiKlary  between  distinct  visual 
areas.  This  would  produce  a value  between  the  values  which  would  he 
produced  for  pixels  entirely  to  the  twi)  sides  of  the  boundarv. 
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One  can  structure  the  system  to  analyze  sets  of  increasinj;  window  sizes 
(I'.g.,  2*'  ’<  2'^,  n = 1,2,...)  in  some  li  ii'rarcli  i ca  I manni'i  (Hi'senfeld  and 
Thurston  [1971],  Marr  (19751)  so  tliat  the  correct  size  is  sure  to  be 
iniluded.  One  then  must  deal  witii  tlte  problem  of  automatically  selecting 
the  relevant  data  or  r.aintaining  all  of  it  in  some  multi-level  data  structure. 
Although  the  problems  become  quite  tricky,  they  do  not  seem  Insurmountable; 
however,  such  systems  structures  are  still  not  yet  understood  very  well. 

I'he  hierarchical  proct'ssing  cone  structure  (Hanson  and  Riseman 

(19741)  might  allow  an  integrated  attack  upon  these  problems.  Extraction 

of  features  over  varying  size  windows  is  implicit  in  the  design  of  the 

system.  The  proc’essing  cone  is  a simulation  of  a parallel  array  of  micro- 

comjmters  that  is  hierarchically  organized  into  layers  of  decreasing 

2 2 2 2 

resolution  (256  , 128  , 64  ,...,  1 ).  Sequences  of  operations  allow 
lull  resol\it  ion  image  data  to  he  transformed,  compressed  in  amount, 
and  stored  at  higher  levels  of  the  cone  as  coarse  resolution  features 
of  subareas  below.  This  allows  both  local  and  global  features  to  be 
available  simultaneously.  Coarse  descriptions  of  ma(i'r  areas  might  be 
utilized  to  guide  the  formation  of  more  refined  representations  by  merging 
atomic  areas  at  lower  levels.  In  this  wav  the  cone  allows  the  system  to 
work  at  both  levels  of  description,  either  independently  or  dependently, 
but  finally  with  tlu'  goal  of  bringing  the  local  and  global  descriptions 
Logo  t lier . 

An  interesting  approach  to  the  recursive  I'mbedding  of  texture 
cliararteristics  li.is  recently  been  suggested  by  Khrich  and  Foith  [1975,  1976]. 

A versatile  data  structure  for  extracting  the  relationships  between  intensity 
peaks  and  valleys  of  a one-dimensional  scan  line,  called  a 'relational  tree'. 
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hi'!  bi'i'ii  di've  I opi’ii . Tho  description  of  peaks  and  .subpeaks  (which  cotilil 
repre.sent  ni  i cro  t ex  tu  res  within  a macrotextnre)  in  terms  of  their  widlli 
i ,md  ri'lative  heiglits  can  be  extracted  from  the  waveform  ratlier  simply.  ' 

The  relational  tree  captures  structural  information  in  a hierarchical 

1 

fashion;  fine  texture  appears  as  'frontier  peaks'  embedded  in  coarse  (wide) 
peaks  representing  more  global  textural  characteristics.  The  two-dimen- 
sional case  becomes  somewhat  more  com;)licated  since  the  structure  ol 
distinct  scan  lines  in  the  same  or  ditferent  orientations  must  be 
ciirrelated.  The  approach  appears  to  be  quite  powerful  for  c 1 ass  i t i i-.at  ion 
and  description,  and  bears  promise  for  applications  to  segmentation. 

i 

First -and  Higher-Order  Statis^tic^ 


I One  approach  to  texture  description  uses  first-  and  higher-order 

statistics  of  (monochromatic)  scene  elements  (Julesz  [1975]).  1 he  f irst-order 

statistic  is  simply  the  average  gray  level  of  an  area;  and  differences  in  tills  parameter 
have  been  widely  employed  in  previous  work.  Tlie  computation  of  the  second- 

V 

order  statistic  for  an  area  requires  the  determination  of  the  likelihood  I 

of  finding  gray  levels  i and  j for  pairs  of  points  as  a function  both  of  j 

the  length  and  orientation  of  a line  between  them.  Third-order  statistics  | 

■ I 

are  extracted  as  a function  of  the  relationships  between  three-tuples  of 
r points. 

1 i ' 

The  informal  ion  in  seonid-order  statistics  is  precisely  the  data  'i 

contained  in  the  'gray  level  adjacency'  nvitrices  which  have  been  studied 

by  Haralick  [1973]  and  Rosenfeld  &,  Troy  [1970].  For  a given  length  and 

orientation,  a square  matrix  ol  the  co-occurreitccs  of  gray  level  i with 

t 

1 


1. 


IH 


gray  level  j in  the  defined  relationships  must  be  constructed.  This 
technique  has  been  used  effectively  for  classification  of  texture  samples 
by  transforming  each  matrix  of  values  into  a scalar  value  by  computing 
features  such  as  the  angular  second  moment  about  the  diagonal  (ASMD) . 

It  is  Interesting  to  note  that  the  ASMD  can  be  computed  locally  in  parallel 
using  little  intermediate  storage  in  the  proccvining  cones  (Hanson  & Riseman  [1974J). 

In  ^ series  of  interesting  experiments,  Julesz  etal.  [1973]  and  Julesz  (1975] 
demonstrated  that  two  textures  with  identical  first-  and  second-order 
statistics  but  different  third-order  statistics  cannot  be  spontaneously 
discriminated  by  a human  observer,  while  differences  in  first-  or  second- 
order  statistics  generally  allow  spontaneous  discrimination.  They  showed 
that  textures  could  be  constructed  with  these  characteristics  by  performing 
simple  transformations  of  the  texture  element^.  This  would  imply  that  if 
the  first-  and  second-order  statistics  were  extracted  from  a texture, 
these  often  could  be  used  to  determine  boundaries  of  the  textured  area. 

Unfortunately,  use  of  second-order  statistics  is  not  a computationally 
viable  approach.  For  purposes  of  segmentation,  the  amount  of  data  that  would 
have  to  be  collected  to  determine  similarity  or  differences  of  general  second-order 
statistics  of  unknown  areas  of  arbitrary  size,  shape,  and  placement  is 
an  enormous  data  overload.  Thus,  segmentation  based  on  extracting  a full 
range  of  second-order  statistics  seems  doomed  to  failure.  However,  the  use 
of  sel ected  features  dependent  only  on  second-order  statistics  could  prove 
quite  fruitful. 

*It  should  be  noted  that  this  stiidy  was  constrained  to  black  and  white 
binary  images.  Although  the  extrapolation  to  more  general  scenes  is 
reasonable,  it  should  be  done  with  caution. 


Boujidao'  Form.iL  lo.n 
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There  are  many  ways  to  form  a description  of  a scene  in  terms  of 
a line  drawin>?.  There  are  several  intermediate  representations  oi 
boundaries  that  often  are  formed  prior  to  obtaining  the  final  represeatat ion. 
Computation  of  the  sirengtli  (and  sometimes  orientation)  oi  the  gradient 
of  intensity  can  be  obtained  via  the  application  or  a spatial  differentia- 
tion operator.  The  transforiaed  image  is  composed  of  independent  edges 
whose  spatial  relat ionsliips , among  other  things,  can  be  used  to  infer 
more  global  entities.  Optionally,  these  edges  might  be  filtered  to 
remove  redundant  and/or  less  important  edges.  Then,  a subset  of  edges 
might  be  linked  into  line  segments;  in  some  domains  tiiese  segments  miglit 
be  restricted  to  linking  edges  that  eitlier  form  a straight  line  or  observe 
certain  constraints  on  edge  orientation.  Finally,  the  line  segments 
might  be  grouped  together  in  terms  of  the  standard  ways  lines  may  come 
together  at  vertices  or  in  terms  of  more  complete  boundaries. 

Much  of  the  early  research  in  scene  analysis  was  based  on  techniques 
for  tracking  straight  lines  (Roberts  [1965],  Binford  & Horn  [1971]). 

If  the  objects  under  consideration  were  polyhedra,  then  knowledge  about 
their  vertices  could  also  be  employed  during  or  after  the  formation  of 
straight  line  segments  (Roberts  [1965],  Clowes  [19/1],  Huffman  [1971], 

Shiral  [1972],  Huda  & Hart  [1973],  Waltz  [1975]).  It  should  be  evident 

that  in  natiir.il  scenes  those  techniques  will  ht  qtiite  limited  if  utilized 

alone.  More  general  nonsemantic  procedures  for  binding  local  edges  into  longer 

segments  are  needed.  An  interesting  approach  that  the  reader  should  be  aware  of, 

but  that  we  will  not  examine  here,  involves  an  understanding  of  surfaces, 

their  orientation,  and  the  light  reflected  from  them  (Mackworth  [ 1973]  , Horn  [ 1975] ) . 
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There  are  a variety  of  techniques  and  control  strategies  that  can  be 
used  to  form  edge  representations.  Edges  can  be  seqeuentlally  tracked 
along  points  of  roughly  uniform  gradient  strength.  On  the  otlier  liand, 
edge  information  can  be  extracted  prior  to  sequential  or  parallel  binding 
of  edges  into  line  segments.  Colinear  edges  can  produce  clusters  in 
feature  space  via  Hough-like  transforms  (Duda  and  Hart  [1973]).  This 
can  allow  groups  of  edges  with  similar  properties  to  be  globally  analyzed 
and  provide  local  direction  to  the  control  of  boundary  formation 
(O'Oorman  and  Clowes  [19731,  Nevatia  [1975],  Shapiro  [1975],  Wechsler 
and  Sklansky  [1975]).  The  latter  approaches  bear  similarity  to  techniques 
for  region  analysis  presented  later  in  this  paper,  and  we  hope  the  reader 
can  extrapolate  tlielr  potential  by  considering  the  general  utility  of 
global  feature  analysis  in  forming  regions. 

One  cannot  expect  a low-level  system  to  directly  provide  all  final 
boundary  representations  which  might  be  meaningfully  interpreted  by  a 
semantic  processor.  This  search  space  is  enormous  and  constraints  upon 
tlie  final  representation  are  almost  always  embedded  in  the  techniques 
and  control  strategies.  Sometimes  in  cases  of  uncertainty  the  goal  of 
forming  a single  final  representation  can  be  relaxed,  and  the  determination 
of  a consistent  representation  can  be  delayed  for  other  processes  whicli 
utilize  different  types  of  knowledge.  Thus,  we  will  limit  ourselves  to 
examining  variations  on  two  approaches  to  the  early  stages  of  finding  lines. 
A survey  on  edge  detection  techniques  by  Davis  [1974J  focusses  upon  otlier 
issues  and  approaches,  providing  a nice  complement  to  the  treatment  in 


this  paper. 


4.1  Spatial  Pi f ferentiat ion 


^Oa 


As  we  have  stated,  the  usual  first  stop  in  computitiK  boundaries  is 
file  application  of  a spatial  differentiation  operator  (often  defined  as 
an  ed;te  mask  or  template)  to  transform  the  original  image  into  one  with 
edges  highlighted.  Altiioiigli  many  such  operators  have  l>een  suggested 
(Ihieckel  [197'J|,  Bullock  |1974),  Fram  and  Deutscli  [1975],  McKee  and 
Aggavwal  1197')!,  Mart  11975)),  one  tliat  combines  low  ctmiplexity  witii 
liigh  reliability  is  an  operator  (Kirscli  [1971))  lompuled  on  the  local 
window  shown  in  Figure  hi  as  follows.  bet 
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where  the  Indices  are  computed  modulo  8;  then  let 

S(X)  ••  Max  and  D(X)  = .{l|such  that  is  max}. 


This  gives,  at  every  point  X,  estimates  of  the  gradient  strength  S(X) 

and  gradient  direction  D(X)  (quantized  to  45°  intervals).  Later  we  will 
show  that  It  is  useful  to  save  the  sign  of  the  gradient;  this  will  tell  ns 

the  sides  that  are  light  and  dark  as  we  move  across  an  edge  of  a given  contrast. 

If  X is  within  a uniform  area,  S(X)  = 0 and  orientation  of  an 
edge  is  meaningless,  whereas  D(X)  is  defined,  but  not  necessarily  unique, 
in  all  other  cases.  Actually  D(X)  only  encodes  four  unique  orientations. 

For  each  orientation,  though,  information  is  available  as  to  which  side 
of  point  X is  the  best  fit  of  the  edge;  an  example  of  the  two  placements 
of  a vertical  boundary  is  shown  in  Fig.  3b. 


Suppression  of  Redundant  Data 

A disadvantage  of  most  spatial  differencing  operators  is  that  multiple 

. <■ 

indications  pf  the  same  line  can  be  produced.  The  raw  digitized  data 
sometimes  introduces  a gradient  of  brightness  which  is  not  a step  function 
when  one  is  expected.  In  the  house  scene  of  Figure  Id,  the  sharp  boundary 
between  sky  (intensity  = 52)  and  roof  (intensity  = 33)  actually  has  one 
Intermediate  row  of  transition  values  (intensity  46)*  This  problem  is 
related  to  the  placement  and  size  of  the  scanning  point  which  might  over- 
lap the  areas  (the  problem  of  "mixed  pixels”)  . In  many  cases  there  is  a ramp  function 
in  the  data  because  of  shadowing  and  highlights.  Thus,  many  different  window  placements 
will  redundantly  detect  a boundary,  whereas  the  goal  is  to  find  a single  line 
which  best  separates  the  two  areas. 

In  the  case  of  the  specific  operator  we  have  Introduced,  an  additional 
problem  of  multiple  representations  of  the  same  boundary  occurs.  If  an 
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Fi^;urt  A spatial  differentiation  operator. 

(a)  Ttie  strength  S(X)  anil  the  orientation  D(X)  of  the  gradient 
at  X.  (h)  Plaoenent  of  edge  with  respect  to  X.  (c)  Kdges 
which  are  logically  equivalent  can  lit-  formed  at  ailjacent  points. 

(dl  Non-maxima  suppression  could  cause  t ragmentat ion.  (e)  Shifting 
edges  can  standardize  their  location.  (f)  Directions  for  non-maxima 
suppress  ion  of  edges.  (g)  Suppression  operates  only  for  edges  with  the 
same  sign  of  the  gradient  so  that  one  pixel  wide  regions  can  be 
deti'C  t ed  . 


S(X)  = Max 

D(X)  = {i|m^  is  Max} 
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edge  occvirs  on  one  side  of  point  X,  the  edge  will  be  detected  when  the 
3 X 3 window  is  centered  on  some  of  the  points  adjacent  to  X.  A simple 
case  is  depicted  in  Figure  3c  where  a vertical  line  is  detected  as 
equivalent  adjacent  boundaries  with  equal  strength  and  orientation.  Any 
single  point  appears  in  nine  different  window  placements,  and  anv  pair 
of  adjacent  points  appears  in  six  different  window  placements.  There  is 
a confusing  overlap  of  edge  analysis.  In  areas  to  either  side  of  an  edge 
where  there  is  some  • (possibl y minor)  variation,  the  strengtlis  (and  even 
tt^e  orientations)  of  the  adjacent  edges  may  not  be  identical.  Clearly  it 
is  desirable  to  have  only  a single  Indication  of  a boundary,  and  techniques 
for  cleaning  up  this  information  are  called  for.  However,  as  redundant 
and  weaker  edges  are  removed,  the  condition  of  Figure  3(d)  must  be  avoided; 
the  suppression  of  local  edges  sliould  not  lead  to  global  fragmentation  of 
a line.  Two  operations  will  be  employed  to  enhance  the  meaningful  infor- 
mation: removal  of  logically  equivalent  edges  and  suppression  of  non- 
maximum  strengtii  edges. 

The  representation  can  be  simplified  bv  adopting  a standard  position 
for  edges  at  a given  orientation,  thereby  eliminating  separate  indications 
of  logically  equivalent  edge  positions.  Currently,  a pair  of  parallel  edges  at  adjacent 
pixels  can  represent  a variety  of  situations;  edges  which  are  two  pixels  apart  (and  pro- 
bably distinct) , edges  one  pixel  apart,  or  edges  actually  in  the  same  position.  By 
adopting  a general  convention  of  shifting  edges,  pairs  of  adjacent  boundaries 
can  be  collapsed  into  a single  representative  in  a consistent  manner. 

Tile  standard  positions  that  we  have  selected  for  the  four  orientations  of 
otir  operator  are  shown  in  Figure  3(e).  F.ach  of  the  four  orientations 
associated  with  a pixel  now  has  a fixed  position  relative  to  the  pixel: 
an  edge  can  only  appear  in  the  north  to  southeast  semicircle  about  a pixel. 


— — 1 

J'i 

Tlie  ed^es  In  non-srandard  position  can  be  uniquely  shifted  to  the  pixel 
which  has  that  edge  in  a logically  equivalent,  but  standard,  position. 

The  neighboring  edges  which  could  shift  into  a single  pixel  are  also 

shown  in  Figure  3(e).  In  the  end  there  are  still  8 edge  values  competing  ' 

at  four  unique  locations  with  the  maximum  surviving.  This  could  be 

computed  directly  at  the  first  application  of  the  edge  masks  and  allcjw 

the  8 possibilities  to  coni|)ete  directly  rather  than  be  distributi-d  among 

5 pixels  (and  of  course  competing  witli  many  edges  in  other  positions) 

before  their  results  are  collected  into  the  single  pixel. 

For  many  edge  operators  suggested  in  the  literature,  suppression 
! techniques  are  limited  (or  noisy)  because  information  which  encodes 

the  placement  of  the  edge  with  respect  to  the  pixel  is  not  available. 

The  suppression  schemes  must  focus  upon  the  strength  and  orientation  oi 
these  boundaries  in  order  to  clean  up  the  edge  image.  Various  tiiinning 
and  smoothing  techniques  have  been  suggested.  Rather  than  review  this 
body  of  literature,  we  will  examine  only  the  techniques  for  svippressing 
non-maxima  (Rosenfeld  & Thurston  [1971],  Haves  et  ai . [1974])  of  ed^’es  and 
spots.  They  seem  to  be  directed  towards  the  heart  of  the  problen--a  local 
analysis  which  retains  a good  fit  of  an  edge  and  suppresses  redundant  data.^ 

Suppression  can  take  place  by  hav’ing  each  local  edge  examine  the 
strength  and  orientation  of  the  edges  in  its  neighborhood.  It  will 
be  suppressed  by  Indications  of  parallel  (or  possibly,  near  parallel) 

^This  process  can  actually  proceed  prior  to  grouping  loc.al  edges  into  a 
more  global  line.  However,  a single  straight  line  which  is  globally 
the  best  fit  might  be  useful  in  directing  the  local  analysis  and  is  an 
argum'  nt  for  delaying  such  loc.al  suppression. 
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lines  of  (greater  strength  nearby.  The  simplest  heuristic  scheme  given  our 
edgi-  representcat ion  is  portrayed  in  Figure  1(1  ) where  an  «,dge  at  some  orien- 
tation can  be  suppressed  by  a stronger  parallel  edge  which  is  adjacent 
in  a direction  perpendicular  to  the  orientation  of  the  first;  this  means 
that  for  each  edge,  we  must  examine  the  values  of  exactly  two  of  its 
neighbors.  There  are  many  other  heuristics  that  can  be  employed  as  a 
function  of  strength  and  orientation;  e.g.,  edges  at  45°  angles  to  each 
other  might  also  activate  suppression,  or  can  require  a different  thres- 
hold factor  of  relative  strength  before  suppression  will  succeed. 

Finally,  there  is  a problem  with  this  suppression  scheme  in  that  one 
pixel  wide  regions  will  produce  parallel  edges  which  would  suppress  each 
other.  Here  we  can  employ  the  sign  of  the  gradient  to  discriminate  between 
distinct  boundaries  as  depicted  in  the  example  of  Figure  3(g).  Only 
parallel  edges  having  the  same  gradient  sign  can  be  multiple  instances 
of  the  same  edge;  suppression  will  not  take  place  otherwise. 

Figure  4 shows  a differentiated  portion  of  subimage  A in  Figure  Id 
(the  diagonal  roof  and  door  area  in  the  house);  Figure  4a  and  4b  reprt-sent 
S(x)  and  D(x)  before  suppression.  Note  that  in  Figure  4a  S(x)  has  been 
scaled  by  a constant  factor,  and  the  sign  of  the  gradient  has  been  included 
to  guide  the  later  stages  of  suppression.  For  simplicity  the  four  orienta- 
tions of  edges  are  represented  graphically  in  Figure  4b,  even  though  this 
leaves  ambiguous  the  exact  position  of  each  edge  in  the  diagram.  In 
Figures  4c  and  4d,  the  edges  have  been  moved  to  standard  positions  so 
that  their  position  relative  to  the  pixel  is  between  north  and  (moving 
to  the  •.  ight)  southeast.  Now  adjacency  and  contact  of  boundaries  is 


c 1 earer . 
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Figvires  4e  anti  4f  show  tho  rusnlts  of  first,  suppressing  non-maxim. i and  Llieii 
tliresholding  out  weak  edges;  the  thre-shold  i ng  is  carried  out  by  i.oit!))u  i ; ug  mean 
and  variance  of  non-zero  S(x)  and  remt«ving  edges  wliose  strength  is  lulow 
p + ka  (i'.ere  k = -.25).  Note  the  presence  of  the  one-pixel -wide  1 iglit  vertii/il 
region  in  the  bottom  center  of  the  image.  The  important  boundaries  st.ind  out 
clearly  but  there  are  still  some  spurious  and  redundant  edges  along  some  boundarit' 
or  at  vertices.  These  edges  are  a result  of  the  .omplexit  ies  itttroduied  hv  m.mv 
t)f  the  edge  formation  windows  overlapping  a boundary  in  various  ways.  Mauv  of 
them  can  be  removed  by  using  a suppression  pass  that  is  slightly  more  soph  i.st  i i .1 1 is1 . 

.^.3 Relaxat  ton  t'roces.ses  for  Boundary  Formation 

.411  of  tlie  preceding  considerations  miglit  be  generalized  and  embod  i ed  in  a 
process  of  comiietition  and  cooper.it  ion  within  tlie  parallel  ' relaxat  iv>n'  procedure  . 
tormulated  bvRosenfeLd,  Hummel  & Zucker  [ 1976];  using  this  approach  there  has  been  an 
excit  ing  range  of  appl  icat  ions  f rom  boundary  analysis  fZucker,  Hummel  4 Kosenfeld  11^^75), 
Vanderbrug  119/5))  to  template  matching  (Davis  & Rosenfeld  |1976]).  This  approach 
of  distributed  coinpulat  ion  overlaps  earlier  ideas  including  the  spring-loaded  templates 
to  f 1 exibl  y map  parts  into  a whole  ( Fischl  t r and  FI  sell  lager  [1973]),  "constraint  satisfaction" 
appl  ieil  to  label  1 i ng  vertices  ot  pol  vhedr..  wi  rli  sh.uiows  (Waltz  ) 1 H75]  ) , .iml  to  tlie  format  ion 
o t .1  cons  i s ( en  t set  ot  labels  f or  I he  i d on  l i t i es  o f reg  ion.s  b v Tenenb.niiin  .and  Ha  r row  ( I 97  n I . 

Mere  we  will  briel'ly  review  tiu-  general  id  '.a  wti  i 1 e applying,  tlu.se  ideas  of 
distributed  comput.atic'ii  to  boundary  formation.  !hi.-  npiiroacii  can  embiuly  not  onlv 
nonm.ixim.a  edgi'  suppression  but  also  eilge  fitting  aiui  binding.  3 lie  adv.ant  ape  ot  tlu' 
relaxation  techniques  is  that  likelihoods  ot  all  orientations  of  eacli  adiacent  point 
(.111  (ontrihiite  to  the  label  assigned  to  a giviui  point,  not  just  tlie  'he.si'  choiee  for 
,iil  j.ii  eiil  cleiaeiil  ;.  Wtureas  111  t lu'  pievious  algorillims  tlie  .sireiigllis  of  edges  .it 
nonoptlmal  orientations  are  thrown  away,  tliey  now  prove  very  useful.  Thus,  a 
break  in  i long,  liorizontal  line  might  he  repaired  automatically  hv  the  context.  We 
present  our  own  varian"^  to  tlie  ai'pro.icli  i>f  Ziicker  et  al  . [1975]. 


(a)-(b)  The  strengtli  and  orientation  of  edges  produced  by 
applying  the  operator  of  Figure  3 to  suhimage  A in  Figure  1(d). 

The  sign  of  the  gradient  is  retained  to  show  the  relative  brightness 
on  each  side  of  an  edge.  (c)-(d)  Removal  of  logically  equivalent 
edges  by  standardization  of  the  position  of  edges  with  respect  to 
the  pixels  they  separate.  (e)-(f)  Suppression  of  non-maxima  edges 
and  thresholding  edges  whose  strength  is  below  H = p - .25o. 
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Assume  we  have  a set  of  elements  A = {a,,..., a } and  a set  of 
labels  A = { },  where  each  label  represents  a possible  Inter- 
pretation for  each  of  the  elements.  In  this  example  domain  the  elements 
are  the  image  points  and  the  label  set  consists  of  edge  orientations, 
with  the  null  label  used  to  represent  the  absence  of  an  edge.  A 1 abelling 

p = is  a sequence  of  probability  vectors  p^:  A *■  [0,1] 

with  being  the  probability  of  the  hypothesis  that  X^^  is  the 

correct  label  for  a^ . Shortly  we  will  show  how  our  spatial  differentia- 
tion operator  can  be  used  to  provide  initial  estimates  of  these  values. 

The  relaxation  process  involves  an  iterated  updating  of  these  pro- 
babilities in  an  attempt  to  move  P towards  a globally  correct  labelling. 

This  is  achieved  by  updating  the  value  of  each  p.(X.)  on  the  basis  of 
the  Information  in  its  local  "neighborhood".  Thus,  if  a^  is  in  N(a^),  the 
neighborhood  of  a^,  then  the  probability  of  label  X^^  at  a^  will  be  increased 
(decreased)  by  label  X^  at  a.  if  the  labels  are  compatible  (incompatible).  The 
effect  of  this  change  on  will  be  weighted  by  i^e  likelihood  of 

the  influencing  hypothesis.  Thus,  the  belief  in  each  interpretation  can  be 
strongly  influenced  by  its  context,  leading  to  competition  and  cooperation  between 
alternative  Interpretations  of  elements  In  a common  neighborhood. 

Now  we  only  need  to  define  the  compatibility  functions  which  specify 
the  relationships  between  labels.  To  some  extent  this  allows  the  semantics 
of  the  domain  to  be  employed  via  propagation  of  local  Influences  in  arriving 
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at  a global  interpretation.  We  define  the  compatibility  function  between 
a^  and  a^  as 


( 

-1.1] 

such  that  r^j(X|^,X£) 

> 0 

if  X, 
k 

and 

■V 

are 

compatible ; 

< 0 

if  X^ 

and 

X^ 

are 

incompatible 

= 0 

if  Xj^ 

and 

are 

independent . 

Here  we  use  the  term  compatible  in  the  sense  of  the  phrase  "lends  support 
to".  For  edge  labelling  the  compatibility  of  edge  orientations  must  capture 
both  the  types  (orientation)  of  edges  as  well  as  their  spatial  relationship. 
Finally,  we  have  the  basic  idea  of  updating  the  change  in 


as 


AP((>.)  = I ‘f;,  ) r (X  ,X  )p  (■.  ) for  i = 1 n 

‘ and  k = 1 , . . . ,m 


will'll'  il  I , is  .1  wi'lj'.hlliig  i>t  I he  i n I I iii'iiee  iil  I he  various  upon 
Let  us  denote  the  probability  of  a label  Xi^  after  the  t*’^  iteration  as 
Pj^*'\Xj^).  Since  Pj  + Ap^  can  become  negative  for  a label  with  strong 
negative  evidence  from  its  context,  the  updating  will  be  nonlinear  as 
foil ows 


with  Ap^  remaining  in  the  interval  from  -1  to  +1. 

We  now  modify  the  equation  to  normalize  the  updated  values  across  k = .,. 
in  order  to  maintain  a probability  vector 


. ,m 


(1+1)  •’/'  ^ “I-/' 

k 


Ibis  updating  process  can  be  Iterated  some  number  of  times,  converging 
upon  a local Iv  consistent  interpretation  which  hopefully  is  a globally 
acceptable  interpretation.  Some  results  on  t tie  convergence  of  tliis  process 
are  providi'd  by  Ziicker,  Krisbnamnrty  and  Hoar  |197b|. 
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4.4  Apply ing  Relaxation  to  an  Inter-Plxel  Edge  Representation 

The  ideas  of  the  last  section  will  be  illustrated  in  a specific 
example.  For  this  treatment  we  choose  a different  representation  of 
edges  among  the  pixels.  Using  the  differentiation  operator  presented 
in  the  previous  sections,  the  set  of  possible  edges  that  can  be  associated 
with  a pixel  are  shown  in  Figure  5(a);  this  would  leave  us  with  5 possible 
labels  at  each  point,  LINE0°,  L1NE45°,  LINE90”,  LINF.135°,  NULL  (no  line). 

Here  we  simplify  this  representation  bv  only  allowing  horizontal 
and  vertical  edges  to  be  placed  in  the  image  bet\^en  pairs  of  adiacent 

points  as  in  Figure  5(b).  R.ather  than  associating  edges  with  pixels,  1 

we  have  an  inter-pixel  edge  representation  with  the  location  and  orienta-  1 

tion  of  edges  represented  at  a local  level  more  naturally.  This  type 

of  representation  has  some  desirable  characteristics  and  has  been  used 

elsewhere  (Brice  and  Fenema  [1970],  Yaklmovsky  [1976],  Prager,  Hanson 

and  Riseman  [1^76]). 

There  are  now  only  twice  as  many  possible  edges  as  pixels  (Fieure  5b) 
compared  to  four  limes  as  many  before  (Figure  5a).  However,  we  will 
view  these  edges  quite  differently.  The  results  presented  in  earlier 
sections  allowed  the  four  types  of  edges  about  a pixel  to  compete,  with  onlv 
the  strongest  surviving.  Figure  5(c)  demonstrates  why  we  do  not  wish 
to  allow  the  horizontal  and  vertical  edge  around  a point  to  he  mutually 
exclusive — they  both  should  be  present  for  diagonal  boundaries.^ 

This  leads  us  to  viewing  each  horizontal  and  each  vertical  edge 
in  our  current  representation  as  a distinct  element  a^  in  the  set  of 

t 

^Note  that  now  higher  level  processes  will  be  required  to  detect  the  global 
characteristics  of  a straight  line  at  some  orientation  other  than  horizontal 
or  vertical. 
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elements  A.  For  each  element  there  are  only  two  labels  to  be  assoiiated 
with  it,  EDGE  and  NO-EDGE.  Since  the  relaxation  scheme  that  we  have 
described  demands  that  tlie  probabilities  of  the  labels  sum  to  one,  we 
have  a situation  which  has  simplified  nicely,  where  only  one  probability, 
P(EDGE),  need  be  stored  to  represent  the  likelihood  of  the  two  labels. 

Before  discussing  the  compat ab i 1 i t y coefficients  and  the  manner 
in  which  the  labels  will  be  updated,  we  will  adapt  our  differentiation 
operator  of  Figure  3 to  the  new  situation.  Figure  5(d)  demonstrates 
the  computation  of  the  strength  S(E^)  of  an  edge  E^  (in  this  case  verticall 
as  the  max  of  tlie  output  of  the  two  masks  which  were  associated  with  putting 
an  edge  in  the  given  position.  Now  let  us  utilize  the  strength  of  the 
globally  strongest  edge  in  the  image 

■SMAN  = max  S(E^) 

E.f  image 

to  convert  each  S(E.)  into  the  probability  of  EDGE  (and  consequently  determining 
the  probability  of  NOEDGE)  at  location  1 by 


‘max  • 

Thus,  the  probability  of  an  edge  will  approach  1 only  at  the  strongest 
edges  in  the  image. 

Onlv  the  specification  of  the  con patabll i l y functions  remain.  We 
must  define  ^ ^ ^ (“l-l]  to  cause  suppression  of  redundant  lines 

and  strengthening  of  weak  or  incorrect  lines.  Generally  these  are  intui- 
tively specified  as  heuristic  weights.  let  us  consider  the  types  of  weights  on 
the  neighborhood  of  surrounding  labels  which  should  influence  the  likeli- 


hood of  a horizontal  label. 
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Figure  5:  An  Inter-pixel  edge  representation  for  relaxation. 

(a)  Competing  orientations  of  an  edge  for  each  pixel  in 
the  previous  representation.  (b)  Both  horizontal  and  vertical 
i edges  about  a pixel  will  be  allowed  in  the  new  representation. 

I 

(c)  For  complete  diagonal  boundaries  both  horizontal  and  verti- 
cal edges  at  a pixel  are  required.  (d)  Modified  edge  operator 
is  maximum  strength  of  the  two  placements  of  masks.  (e)  Tlie 
labels  in  the  neighborhood  of  a horizontal  edge  which  miglit 
be  used  to  update  the  probability  of  a horizontal  edge; 
note  that  the  null  label  is  depicted  by  [ !. 
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The  labels  shown  In  Figure  5(e)  are  the  only  labels  that  will  be 
allowed  to  affect  the  probability  of  the  horizontal  label  in  the  center  o(  a 
3 x:  3 window.  Horizontal  edges  "a"  to  the  left  and  right  of  a horizontal 
label  represent  the  continuation  of  a horizontal  line  and  should  support 
the  likelihood  of  that  label  by  a positive  coefficient;  the  null  label 
"b"  left  and  right  should  have  a negative  weight  because  the  edge  doesn't 
continue.  Vertical  edges  "c"  should  have  positive  weights  since  they 
represent  a consistent  extension  of  a horizontal  edge.  Horizontal  edges 
above  and  below  "d"  call  for  suppression — hence  a negative  weight.  Finally, 
the  presence  of  a null  label  "e"  above  or  below  a horizontal  edge  might  be 
considered  as  supporting  evidence  “ +•  3)  of  that  horizontal  label 

and  would  be  positive.  The  size  of  the  weights  employed  represent  one's 
heuristic  estimate  of  the  relative  compatabi 1 i ty  of  the  label  of  point  ; 
on  the  horizontal  label  of  point  i.  Specification  of  the  vertical  label 
can  be  derived  by  symmetry  (a  90°  rotation  followed  by  a mirror  image 
t ransformat ion) . 

The  correlations  for  updating  the  null  label  can  be  heuristically 
specified  in  a similar  fashion  but  it  is  difficult  to  specify  as  a set  of 
linearly  independent  contributions.^  We  will  address  this  question  again 
shortly.  Here,  an  r . . = 0 on  all  points  will  cause  the  probability  of  the 
null  label  to  vary  inversely  with  positive  or  negative  changes  in  the  evidence 


Note  that  7-ucker,  Hummel  & Rosenfeld  (1975]  deal  with  the  null  label  by 
setting  up  a competing  null  label  process.  However,  the  desirable 
weights  are  only  clear  in  areas  where  there  is  no  evidence  of  strong 
edges  anvwhere  in  the  local  context.  Thus,  if  multiple  edges  for  a single 
boundary  are  allowed,  the  null  label  probabilities  only  need  to  grow  in 
areas  without  edges.  If  one  is  trying  to  carefully  refine  the  presence 
of  a boundary  to  a single  thinned  edge  representation,  the  features  for 
increasing  the  probability  of  the  null  label  cannot  easily  be  expressed  as  a 
set  of  weights  for  a linear  function. 
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of  an  edge.  Thus,  we  have  the  means  of  computing  a change  in  the  proba- 
bilities of  the  horizontal  or  vertical  labels  based  on  the  surrounding 
context,  and  by  renormalizing  obtain  new  probabilities  of  these  edges. 

Figure  6 are  results  of  examples  with  different  compatab i 1 1 ty  co- 
efficients and  show  various  problems  with  the  process  as  it  lias  been 
formulated  in  this  paper.  In  order  to  avoid  a strong  edge  from  being 
overwhelmed  by  the  combined  effect  of  a pair  of  weaker  parallel  edges  to 
either  side,  reduction  of  strength  of  non-maxima  edges  by  some  factor  k 
(in  our  example  k = 2)  will  be  applied  prior  to  beginning  the  relaxation 
process;  i.e.,  all  edges  parallel  and  adjacent  to  a stronger  edge  will 
be  reduced.  Figure  6(a)  shows  the  resultant  vertical  and  horizontal 
probabilities  and  an  edge  image  with  all  edges  with  probability  lower 
than  .2  removed.  It  is  clear  that  there  are  incorrect  edges  whose  pro- 
bability must  be  lowered  while  many  vertical  edges  in  the  diagonal  boundary 
-.iiould  be  Increased. 

Figure  b(b)  shows  an  example  set  of  coefficients  and  the  results 
after  1 iteration,  while  6(c)  shows  results  after  6 iterations.  The 
information  cleans  up  with  most  of  the  vertical  spurs  hanging  off  the 
diagonal  boundary  in  6(a)  being  rapidly  reduced.  However,  the  major 
diagonal  boundaries  are  missing  key  vertical  edges  whose  probability  has 
also  been  reduced.  In  addition  the  upper  right  diagonal  boundary  started 
with  lower  probability  edges  and  tliey  are  in  the  ptocess  of  disintegrating. 
In  order  to  combat  these  effects,  the  size  of  some  of  the  positive  weights 
are  increased  in  the  example  of  Figure  6(d).  The  probabilities  of  edges 
after  6 iterations  show  many  spurious  edges  growing  stronger  while  parts 
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of  the  weaker  boundarv  still  disappear. 
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Figure  b 


The  relaxation  process  for  boundary  formation. 

(a)  The  initial  probabilities  of  vertical  and  horizontal  edges, 
and  the  location  of  edges  with  probability  > .2.  (b)  An  example 

set  of  weights  and  the  probabilities  after  one  iteration.  (c)  Shows 
probabilities  after  6 iterations.  (d)  A set  of  coefficients  which 
tend  to  grow  more  lines,  and  the  results  after  b iterations. 

(e)  The  addition  of  a feature  which  is  a non-linear  function  of  a 
set  of  points  in  the  neighborhood,  and  the  results  after  b iterations. 


I 


r 


30c 


P«OB.  ON  I TER. 


- INITIAL  PRCB. 


vertical  RROf-A&lUITIES 


3 

4 

5 

6 

? 

8 

9 

:o 

1 1 

12 

13 

14 

15 

16 

17 

18 

19 

?0 

3 

13 

10 

5 

3 

7 

21 

10 

9 

8 

17 

7 

1 7 

5 

1 

0 

0 

2 

0 

4 

10 

10 

12 

26 

1 1 

5 

4 

8 

1 9 

9 

19 

8 

7 

7 

18 

4 

0 

0 

9 

11 

26 

12 

1 1 

i: 

26 

13 

9 

3 

5 

8 

19 

9 

9 

9 

b 

7 

7 

,4 

1 

0 

6 

13 

23 

12 

10 

12 

26 

13 

9 

2 

6 

9 

21 

9 

8 

0 

7 

0 

1 

3 

1 

1 

8 

26 

12 

12 

1 1 

14 

37 

13 

2 

3 

7 

9 

20 

9 

0 

0 

3 

0 

1 

3 

1 

2 

8 

14 

18 

26 

64 

15 

1 

5 

•> 

4 

? 

0 

1 

0 

0 

0 

1 

5 

T 

5 

1 

9 

80 

30 

5 

0 

0 

2 

10 

0 

0 

0 

0 

2 

6 

3 

1 

5 

1 

1 

z « 

94 

35 

9 

0 

0 

3 

11 

0 

1 

0 

1 

1 

5 

2 

2 

1 

0 

T 

13 

54 

26 

9 

5 

2 

3 

12 

1 

0 

0 

0 

1 

5 

1 

3 

1 

1 

2 

6 

28 

7 

4 

2 

2 

IB 

13 

2 

0 

3 

0 

0 

2 

0 

1 

3 

0 

4 

0 

1 

3 

1 

0 

1 

b 

14 

1 

0 

0 

0 

1 

0 

1 

0 

0 

0 

1 

0 

1 

1 

4 

10 

2 

10 

HORIZONTAL  PROBABILITIES 


3 

4 

5 

6 

7 

8 

9 

10 

1 1 

12 

13 

14 

15 

16 

17 

18 

19 

20 

3 

95 

93 

40 

28 

13 

15 

22 

27 

29 

59 

66 

63 

26 

20 

1 1 

4 

0 

1 

4 

36 

46 

98 

94 

45 

38 

25 

11 

17 

24 

28 

30 

60 

63 

28 

23 

l6 

9 

3 

5 

13 

25 

38 

94 

97 

98 

45 

36 

23 

12 

19 

24 

27 

59 

61 

64 

28 

6 

4 

5 

3 

4 

15 

29 

41 

96 

96 

99 

44 

31 

14 

12 

.•o 

25 

29 

58 

7 

1 

2 

3 

4 

4 

4 

7 

19 

33 

44 

ICO 

86 

67 

16 

5 

8 

14 

21 

B 

3 

0 

0 

1 

n 

3 

5 

5 

5 

0 

22 

36 

28 

38 

5 

1 

3 

7 

9 

0 

2 

3 

5 

6 

6 

3 

4 

4 

4 

3 

6 

16 

10 

10 

4 

4 

4 

10 

3 

1 

1 

1 

3 

2 

3 

2 

2 

3 

2 

3 

5 

18 

31 

33 

33 

32 

11 

1 

3 

1 

1 

1 

*1 

2 

5 

0 

a 

6 

8 

19 

49 

85 

94 

93 

89 

12 

0 

0 

0 

1 

0 

0 

0 

1 

2 

3 

3 

3 

2 

1 1 

22 

28 

28 

29 

13 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

2 

2 

2 

1 

1 

4 

14 

3 

3 

3 

2 

0 

0 

0 

0 

0 

0 

3 

4 

7 

6 

7 

1 

1 

3 

3 

4 

5 
h 
7 
fi 
» 

to 

II 

1? 

13 

14 


graphic  output 
3 4 5 6 7 8 


THRESMHOLD  ■ .20 

9 10  11  12  13  14  15  16  17  19  1<?  20 


6(a) 


}0e 


Lil 


I 


I ■ 


I ^ 

r . 


PROD.  ARRAYS  ON  ITER.  6 


WERflCAL  PRODAML  ITICS 

3 4 5 6 7 8 9 10  11  12  13  14  13  16  17  10  19  20 

3 120013111300000000 

4 071300000020000000 

3 03162  12  OOOOOOOOlOOO 

6^0000  11  260  13  00000200  1 

7 000000319318900002 

8 000000000050  91  00000 

9 000000000000  99  00000 

10  000000000000  99  00000 

11  000000000000  93  03201 

12  000000000000  46  01  10  10 

13  000000000000000000 

14  000000000000000000 


horizontal  PRODADILITIES 


3 

4 

5 

6 

7 

8 

9 

10 

1 1 

12 

1 3 

14 

13 

16 

17 

18 

19 

20 

3 

99 

91 

0 

0 

0 

0 

0 

0 

0 

12 

48 

29 

0 

0 

0 

0 

0 

0 

4 

0 

0 

99 

98 

0 

0 

0 

0 

0 

0 

0 

0 

18 

19 

0 

0 

0 

0 

S 

0 

0 

0 

0 

97 

99 

99 

0 

0 

0 

0 

0 

0 

0 

46 

81 

35 

0 

6 

0 

0 

0 

0 

0 

0 

0 

99 

99 

99 

0 

0 

0 

0 

0 

0 

0 

43 

7 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

100 

99 

93 

9 

0 

0 

0 

0 

8 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

9 

0 

0 

0 

0 

9 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

3 

0 

0 

0 

0 

0 

10 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

11 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

40 

91 

99 

99 

99 

99 

12 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

13 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

14 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

GRAPHIC  OUTPUT  THRESHHOLD  = .20 

3 4 5 6 7 8 9 10  11  12  13  M IS  16  l7  18  19  20 

3 

4 

5 

6 
7 


8 

9 

10 

11 

12 

13 

14 

6(c) 


6 


Prob (unconnfc!C ted  edge)  = 


1 - MAX[P(e^),P(C2),P(e^)l 

MAX[P(e.),P(e.),P(e,)l 

4 j 6 

^unconnected 


pr»i>b.  Af>»-AY‘:j  vn  irciv.  6 


VE.MICAL  F FvObAbll  rriLS 

3 4 5 6 7 8 9 10  11  12  13  14  I',  16  17  10  19  TO 

3 500000000000000000 

4 0 19  0000000000000000 

5 000  22  05000000000000 

6 000y40  27  08000000000 

000000002  26  03100000 

000000000020  95  OOOOO 

000000000000  99  00000 

000000000000  99  00000 

000000000000  90  00000 

00000000000040000? 
000000000000000000 
000000000000000000 


HORIZONTAL  FRORAbll  ITILS 

3 4 5 6 7 8 9 10  11  12  13  14  15  16  17  16  19  20 

99  67  0000000000000000 

00  97  93  00000000000000 

0000  92  99  93  0000000000  0 

0000000  93  99  99  00000000 

OOOOOOOOOO  100  99  96  1 0 0 0 0 

000000000000030000 
000000000000100000 
000000000000000000 
0 0 0 0 0 0 0 0 0 0 0 0 3 89  99  99  99  99 

000000000000000000 
000000000000000000 
000000000000000000 


ORAFHIC  OUTPUT  THRESHHOLD  * .20 


3 4 5 6 7 0 9 10  1 1 12  13  14  15  16  17  18  19  20 


31 


It  Is  difficult  to  balance  the  effects  of  keeping  the  vertical  edges 

In  the  diagonal  boundary  and  the  suppression  of  growth  of  spurious  edges. 

Much  of  this  problem  is  due  to  the  limitations  of  tising  a function  in  which 

the  points  contribute  in  a linearly  independent  manner.  Figure  6(e) 

shows  the  use  of  one  additional  factor,  the  probability  that  an  edge  is 

unconnected.  For  a given  edge  to  be  a part  of  a continuing  boundary, 

there  should  be  at  least  one  high  probability  edge  emanating  from  each 

end  of  our  given  edge.  If  the  three  possible  edges  from  each  side  are 

called  e, , e„,  e.  and  e, , e,,  e, , respectively,  then 
12  3 4 6 6 

P(unconnected  edge)  = 1 - MAX[P(e^) ,P(e2) ,P(e^) ]*MAX[P(e^) ,P(e^) ,?(e^) ] . 

If  this  probability  is  associated  with  a negative  weight,  it  will  keep 
spur  ions  lines  from  growing  off  a strong  edge  into  areas  where  there  are 
only  low  probability  edges.  However,  this  factor  is  a non-linear  function  of  the 
probability  of  six  labels  and  i s an  extension  of  the  theory  as  presented. 

The  result  of  using  this  negative  contribution  is  shown  in  Figure  6(e). 
Now  larger  positive  weights  on  other  coefficients  can  be  used.  The  results 
after  6 iterations  show  the  desired  effect  with  all  edges  in  the  major 
boundary  growing  stronger.  However,  the  other  diagonal  boundary  dis- 
appears because  weak  points  within  it  caused  it  to  appear  disconnected 
and  it  broke  up. 

If  the  relaxation  process  is  to  just  carry  out  gross  strengthening 
of  boundaries  without  worry  about  producing  thick  lines  with  multiple 
edges,  it  probably  can  be  used  quite  reliably.  However,  if  the  goal  is 
refined  edges  as  we  have  been  seeking  here,  then  it  appears  that  contrlbxi- 
t ions  from  independent  labels  will  be  quite  difficult  to  tune  and  contri- 
butions from  sets  of  labels  will  probably  be  required.  This  area  will 


bo  left  for  future  research. 
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4.5  Grouping  Edges  into  Line  Segments 

The  problems  have  not  been  exhausted.  Although  the  results  In  Figure 
6 appear  to  be  good  segmentations  at  a macro-level,  upon  close  examina- 
tion there  may  still  be  Incomplete  boundaries , textural  edges , and  noise  points . 

If  local  edges  which  are  a part  of  a common  boundary  are  to  be  grouped 
Into  distinct  line  segments,  then  some  criterion  of  similarity  is  needed. 
Certainly  orientation  Is  Important  when  straight  lines  are  being  tracked, 
but  in  the  general  case  this  characteristic  cannot  be  relied  upon.  If 
a pair  of  edges  are  of  approximately  equal  strength,  it  Is  a strong  cue 
that  the  edges  should  be  joined.  However,  the  regions  surrounding  any 
given  region  are  bound  to  have  different  properties.  Therefore,  no  matter 
upon  what  feature  the  strength  of  the  gradient  is  based,  one  must  expect 
widely  varying  values  as  the  boundary  of  a single  region  is  tracked. 

Figure  7a  depicts  three  regions  with  the  edge  strength  based  upon  intensity: 
the  strength  of  two  line  segments  and  bounding  region  are  quite 
different. 

This  problem  calls  for  the  goal  of  forming  line  segments  each  of  which 
lies  between  only  one  pair  of  regions.  Then,  one  can  expect  local  edges 
to  exhibit  characteristics  which  have  less  variance.  In  addition,  the 
comparison  of  features  of  the  regions  to  either  side  of  a pair  of  adja- 
cent edges.  Figure  7b,  can  be  very  useful  in  directing  the  edge  binding 
process  [Perkins  [1976]).  Notice  that  S„,,  and  S „ which  are  equal  in  strength  have  the 

properties  of  R^  in  common,  but  differing  properties  on  their  other  sides 
(R^  vs.  R,^)  lends  to  opposite  signs  on  the  direction  of  tlie  gradient.  The 
simil.iiily  of  two  edges  Ej  and  can  now  be  based  upon  much  more  complete 
information,  a comparison  ot  (FXj.  Fx^)  and  (Kyj,  Fy^)  as  well  as  and  S^. 
Thus,  and  can  he  detected  ;is  distinct  segments  yet  retain  information 

tiiat  tiiey  hound  a common  region. 


Figure  7 
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Use  of  region  information  in  the  grouping  of  edges. 

(a)  Three  adjacent  regions  R^,  with  associated  intensities 

i^,  IX  = A,R,C,  produce  edge  strengths  and 

(b)  The  features  to  either  side  of  a pair  of  edges  can  be 
used  to  group  the  edges  into  boundary  segments.  (c)  Edges 

are  grouped  and  segments  of  boundaries  are  symbolically  labelled 
with  distinct  numerical  symbols.  (d)  Segments  obtained  across 
the  entire  image.  (e)  The  remaining  long  segments  after 


threslioi d ing  by  length. 


Typical  results  of  f>touping  edges  (Prager,  Hanson  & Riseman  fl97(S])  are 
shown  in  Figure  7c.  Distinct  numeric  labels  are  used  to  denote  edges  wliich 
are  part  of  a common  boundary.  Note  the  places  where  a local  variation  caused 
a boundary  to  be  divided  into  subparts.  It  is  easy  to  loin  these  back  together 
by  comparing  the  global  average  values  of  each  segment  (when  there  is  not  a 
vertex  involving  more  than  2 segments).  Then  small  variations  will  have 
little  impact  on  long  lines  and  context  again  allows  decisions  to  be  made 
that  otherwise  would  be  quite  difficult.  The  result  binding  of  edges  produce 
the  segments  of  Figure  7d;  segments  can  be  thresholded  on  the  basis  of 
length  to  obtain  the  most  reliable  boundaries  as  in  Figure  7e.  Further 
analysis  of  these  procedures  are  available  in  Prager,  Hanson  and  Riseman  (197fi]. 

It  should  not  be  difficult  to  utilize  region  and  boundary  information 
in  an  Integrated  manner  within  the  relaxation  process.  The  similarity 
of  tlie  regions  associated  with  contiguous  edges  miglit  be  a weighting 
factor  for  the  mutual  support  of  the  edges.  This  could  be  used  to 
limit  the  mutual  development  of  edges  to  those  that  would  be  grouped  into 
a line  and  might  prevent  the  aggregation  of  texture  element  edges  into 
a spurious  line. 

There  are  other  problems  that  remain.  The  quantization  of  direction 
by  the  Klrsch  operator  is  quite  crude  A straight  line  segment  whose  slope 
Is  not  a multiple  of  45°  increments  might  have  local  edges  appearing  as 
shovm  In  Figure  8a.  One  is  faced  with  grouping  these  edges  into  the  slope 
of  the  line  (as  a continuous  parameter).  Marr  [1975]  has  considered 
a similar  problem  which  can  be  summarized  by  Figure  8b.  The  line  to  be 
detected  can  be  formed  by  grouping  similar  primitive  elements,  which  could 
be  defined  by  the  shape  of  the  element  or  an  edge  of  a certain  orientation. 


Figure  8c  points  out  that  an  abstract  line  can  be  perceived  by  grouping 
a set  of  places  where  each  place  is  specified  by  an  element  (a  line,  endpoint, 
or  some  other  entity  of  arbitrary  complexity)  at  some  point  in  space. 

In  addition,  curves  may  have  to  be  fit  to  any  type  of  boundary;  line  or 
curve  descriptors  need  not  be  restiicted  to  the  orientations  of  the  detectors 
of  small  segments  All  of  these  additional  topics  deserve  careful  treat- 
ment but  will  not  be  considered  any  further  in  this  paper. 


Figure  8:  Additional  problems  in  boundary  formation  discussed  by  Marr  [197bl. 

(a)  The  orientation  of  global  lines  may  be  different  than  the 
orientation  of  the  local  edges  being  grouped.  (b)  Lines 
formed  by  grouping  similar  primitive  elements.  (c)  An  abstract 
line  can  be  formed  by  grouping  distinguished  'places',  in 
this  case  the  endpoints  of  other  lines. 
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The  two  main  approaches  to  region  formation  of  natural  scenes, 
other  than  the  indirect  route  of  forming  boundaries,  are  based  on  either 
merging  local  areas  or  splitting  global  areas,  both  eventually  deter- 
mining regions.  In  this  section  we  will  examine  some  of  the  fundamental 
properties  of  the  problem  domain  that  have  been  utilized  in  a few  specific 

o 

examples  of  region  formation.  It  is  argued  that  most  of  this  work  has 
focussed  upon  either  local  features  and  two-dimensional  spatial  properties 
of  the  image  or  global  features  of  the  image,  but  that  these  different 
types  of  Information  have  not  been  fully  integrated.  By  integrating  the 
types  of  feature  activity  in  a scene  with  an  analysis  of  their  relative 
spatial  distributions,  local  region  formation  can  proceed  under  the 
guidance  of  a global  analysis. 

The  discussion  is  complicated  in  some  cases  by  the  issues  of  semantic 
guidance  in  the  region  segmentation  process.  The  lack  of  a global  view 
of  region  properties  can  be  compensate!  for  by  providing  (the  probabilities 
of)  semantic  labels  to  various  regions,  thereby  allowing  region  merges 
to  be  blocked  or  made  more  likely.  However,  there  is  some  controversy 
how  and  whether  to  bring  semantics  to  the  initial  segmentation  of  a scene, 
some  of  these  questions  will  be  considered  in  the  approaches  that  use  local 
spatial  analysis  and  semantic  guidance  to  merge  small  regions. 

Let  us  examine  three  approaches  to  region  formation.  Recent  articles 
(Zucker  [1975],  and  Weiman  [1976])  can  provide  the  reader  with  additional 
approaches.  The  three  efforts  focus  upon  different  characteristics  of 


scenes : 


1)  Local  spatial  examination  of  the  scene  — This  involves  the  merging 
• of  local  areas  under  syntactic  (comparison  of  visual  features 

of  the  areas)  or  semantic  guidance;  regions  are  built  up  from 
small  pieces  which  have  a high  probability  of  entirely  belonging 
to  a final  goal  region  (Brice  and  Fenema  [1970],  Yakimovsky 
and  Feldman  [1973],  Tenenbaum  and  Weyl  [1975],  and  Tenenbaum 
and  Barrow  [1976]  > Barrow  and  Tenenbaum  [1976]). 

2)  Global  examination  of  feature  distributions  across  the  scene  — 

Here,  peaks  and  clusters  of  activity  in  one-dimensional  histo- 
grams are  used  to  threshold  the  scene  and  recursively  split 
the  image;  large  pieces  of  the  image  are  broken  down  into 
smaller  areas  until  there  is  a high  confidence  that  they  are 
homogeneous  under  the  features  of  interest  (Ohlander  [1975], 
Tomita,  Yachtda  and  Tsuji  [1973],  Schachter,  Davis  and  Rosenfeld 
[1975]);  and 

3)  Interfacing  spatial  analysis  with  feature  analysis  — Clusters 

« 

of  activity  in  two-dimensional  histograms  are  used  to  label 
local  areas  of  the  scene,  followed  by  a spatial  analysis  of 
these  labels  to  guide  the  formation  of  the  desired  regions 
(Hanson,  Rlseman  and  Magin  [1975]). 

X 1 Region  Growing  via  Local  Analysis 

There  has  been  a range  of  work  on  techniques  for  locally  merging 
areas.  One  can  break  any  scene  into  'atomic*  areas  by  merging  all  ad- 
jacent points  (either  4-neighbor  or  8-neighbor  adjacency)  into  the  same 
region  if  they  differ  in  some  property  by  less  than  a threshold  ■&. 


>6 


These  algorithms  are  usually  programmed  to  sequentially  add  points  adjacent 
to  a given  region  or  point^.  If  ■&  equals  0,  these  areas  are  formed  in 
the  most  conservative  manner  possible  (although  even  here  because  of  problems 
such  as  shadows  one  is  not  assured  that  these  regions  each  lie  entirely 
within  an  area  encompassed  by  a single  object).  With  only  a little 
experience  in  region  growing,  it  becomes  obvious  that  there  does  not 
exist  any  single  threshold  for  region  merging  that  is  acceptable,  even  for 
several  different  areas  in  a single  scene. 

Consider  subimage  B of  Figure  Id  which  includes 
on  the  right  side  an  area  of  sky  above  the  somewhat  speckled  roof,  and 

on  the  left  side  tree  foliage  (reflective  highlights  and  shadows),  as 

well  as  sky  or  roof  showing  through  in  some  places.  Figure  9 shows  the 

results  of  region  growing  (using  A-neighbor  adjace. cy)  with  two  values 

of  ■&;  the  conservative  value  does  not  grow  the  tree  together  but  a small 

increase  (on  a gray  scale  of  64  values)  joins  the  roof  to  the  tree.  What 

is  noise  or  textural  variation  in  one  area  becomes  a meaningful  boundary 

in  another.  Thus,  dynamic  setting  of  thresholds  is  needed  in  the  different 

areas,  but  that  is  a complex  process  to  automate  without  global  guidance 

or  a priori  knowledge.  It  is  always  difficult  to  determine  whether  or  not 

a local  discontinuity  with  respect  to  some  feature(s)  should  bar  further 

region  growth  or  should  be  bridged  as  an  Internal  variation  of  the  region 

being  formed.  However,  one  meta-strategy  is  to  form  atomic  areas  far 

^ It  is  easy  to  formulate  parallel  region  growing  algorithms.  In  a 

spatial  array  processor  such  as  the  processing  cones  (described  in  section 
3.1),  every  image  point  can  act  as  an  Initial  'seed'  point  and  all 
regions  can  grow  simultaneously  (witii  some  being  gobbled  up  by  others). 
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too  conservatively  and  then  seek  additional  means  of  merging  these  areas 
(Brice  & Fenema  [1970]).  This  can  be  effective  but  it  Is  very  difficult 
to  avoid  ail  incorrect  local  merges;  a single  'leak'  between  regions  might 
cause  very  large  changes  in  the  final  segmentation.  Semantic  constraints 
nave  been  used  to  provide  greater  reliability. 

Freuder  [1976]  provides  an  Intersting  variation  to  the  region  merging 
process  by  grouping  those  regions  which  are  relatively  more  similar  to 
each  otiier  tlian  to  other  regions.  This  is  continued  and  a tree  of  regions 
is  constructed  up  to  a single  region  over  t lie  scene.  This  whole  structure 
would  be  passed  to  a global  semanti  processor  which  must  extract  the 
relevant  information  for  different  parts  of  tlie  picture  from  nodes  of 
the  tree  at  varying  levels  of  grouping.  Potentially,  this  can  be  a 
powerful  and  flexible  way  to  present  information  to  semantic  processes. 
However,  it  seems  that  the  tree  should  be  greatly  pruned  prior  to  semantic 
processing  if  it  is  to  be  useful.  This  leads  to  the  difficult  questions 
concerning  texture  that  remain  to  be  solved  if  this  is  to  be  a viable 


approach . 


Figure  A simple  region  grower,  where  regions  are  represented  by  a 


unique  symbolic  label  (mod  99). 

(a)  Regions  growing  on  intensity  values  of  subimage  B with 
0=3.  (b)  Regions  grown  with  0=5. 
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5.2  Kt'giuus  Uiul£^i_Semnnt  ic  c:u ulance 

The  tociis  of  tills  paper  has  been  upon  tecliniques  that  can  be  applied 
independently  of  the  semantic  context  in  wliich  the  computer  vision  system 
is  operating.  In  many  systems  store<^  models  are  used  to  match  and  then 
refine  noisy  and  Incomplete  segmentation.  Our  general  position  was  outlined 
in  tlie  introduction — that  it  is  desirable  to  perform  an  initial  segmentation 
without  use  of  semantic  information.  However,  there  are  manv  opportunities 
to  use  such  knowledge  in  tlie  kinds  of  processes  we  have  been  examining. 

Tliis  section  outlines  a couple  of  the  more  general  attempts  to  integrate 
Kfgmentation  and  interpretation  by  controlling  the  merging  of  atomic  areas. 

The  dec Islon-theoref ic  approach  for  image  interpretation  of  Yakimovskv 
anu  Feldman  [1973]  addresses  the  difficulties  previously  outlined  by  intro- 
ducing semantics  in  a decision-theory  framework.  Their  segmentation  process 
is  based  on  merging  atomic  areas  if  the  probability  of  a global  interpreta- 
tion is  improved: 

F {global  inter pretat ion ] context , measurements' 

= f|  PfR(i)  is  INT(  i)  I values  of  measurements  on  R(ill 
1 

* |rR’B(i,j)  is  between  INT(i)  and  lNT(j ) Imeasurements  on  B(i,i)'' 

B(i.j) 
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where  R(i)  is  region  i,  B(i,j)  is  the  boundary  between  regions  i and  j, 
and  INT(i)  is  the  semantic  interpretation  of  region  i.  There  are  several 
Important  points  to  note.  The  identities  of  regions  are  assumed  to  be 
Independent  of  each  other  except  for  the  relationship  across  the  borders; 
borders  are  also  assumed  to  be  independent  of  each  other  and  depend  only 
upon  regions  to  either  side.  These  assumptions  seem  to  be  reasonable 
approximations  for  local  region  interpretation.  However,  when  a roof  can 
appear  on  either  side  of  a tree,  such  an  assumption  fails.  Of  key 
Importance,  thovigh,  is  that  semantic  information  is  introduced  at  the  segmentation 
level — regions  can  be  merged  if  they  improve  the  meaning  of  the  partitioned 
scene.  The  boundaries  between  pairs  of  regions  are  linked  into  the  region 
analysis,  influencing  the  segmentation  and  Interpretation  processes. 

Excellent  results  were  obtained  on  several  road  scenes  and  chest  x-rays. 

This  approach  integrates  the  segmentation  and  interpretation  phases.  It 
also  captures  the  flavor  of  the  more  recent  relaxation  schemes  by  allowing  a 
local  hypothesis  to  be  influenced  by  the  context  of  other  local  hypotheses. 
However,  it  loses  a parallel 

computational  flavor  for  determining  a model  because  decisions  for 
merglng'regions  are  carried  out  sequentially.  Yaklmovsky  and  Feldman 
avoid  an  exhaustive  sequential  search  for  the  best  global  interpretation 
by  approximating  the  best  interpretation  as  a result  of  a heuristic 
search. 

This  the  extremely  dlfticilt  problem  ol  the  determination 

of  the  probabilities  for  this  scheme.  Tlie  probabilities  for  interpreting 
R(l)  Seem  feasible  but  the  relationsliips  of  boundaries  In  an  inherently 
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three-dimensional  world  often  vary  uncontrollably.  Although  one  knows 
that  sky  is  virtually  never  below  ground  (actually  a protrusion  on  a 
mountain  or  a change  in  one's  viewpoint  would  allow  this),  one  cannot 
fix  the  probability  of  seeing  a car  roof  adjacent  to  grass  without  having 
a very  restricted  micro-world.  On  the  other  hand,  the  inherent  two- 
dimensional  spatial  relationships  of  a chest  in  an  x-ray  photograph  are 
relatively  easy  to  approximate.  In  the  most  general  applications  tlieir  approacli 
might  have  less  difficulties  at  later  stages  of  processing  after  initial  segmental 

Tenenbaum  and  Weyl  [1975)  present  a detailed  anal>sis  of  a range  of 
strategies  including  syntactic  and  semantic  merging  criteria.  The  simplest 
nonsemantic  measures  Involve  comparisons  of  average  differences  in 
properties  of  local  areas  immediately  to  either  side  of  the  common 
boundary  (one  of  the  techniques  employed  by  Brice  and  Fenema  [1970]  in 
the  fundamental  early  work  on  region  growing)  or  average  properties  of 
the  entire  areas.  The  two  regions  with  the  weakest  boundary  are  merged 
and  this  process  is  repeated.  All  the  algorithms  performed  many  correct 
merges,  but  a few  bad  merges  ('leak'  of  one  region  into  another)  can 
produce  disastrous  consequences.  Another  difficulty  is  the  lack  of  mean- 
ingful stopping  criteria  for  the  algorithms.  . 
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Tenenbaum  and  Barrow  [1976]  demonstrated  that  the  interactive  human 
semantic  labelling  of  regions  could  be.  used  to  block  most  erroneous 
merges  made  by  nonsemantic  rules.  They  interactively  supplied 
labels  of  Identities  to  initial  conservatively  formed  atomic 
regions  whose  size  is  greater  than  some  threshold  -02 • 

Then,  an  attempted  merger  of  two  regions  with  differing 
labels  can  be  blocked,  while  the  merger  of  an  unlabclled  region  with  a 
labelled  region  will  inherit  the  available  label,  and  finally  the  merger 
of  two  unlabelled  regions  will  remain  unlabelled.  For  those  unlabelled 
regions  that  grow  larger  than  0^ , the  human  again  supplies  the  proper 
label.  For  a simple  office  scene  and  outdo'or  scene,  the  final  results 
are  quite  reasonable  when  0g  is  set  so  that  about  20  regions  are  labelled 
during  this  process. 

This  approach  led  Tenenbaum  and  Barrow  to  employ  a generalization  of 
Waltz's  11975]  constraint  satisfaction  approacli  on  the  region  labels. 
Constraint  .satisfaction  can  be  viewe  ! as  a special  type  of  relaxation 
procedure  where  relationships  between  labels  in  a local  context  can  be 
used  to  eliminate  some  of  the  alternative  l.ibels.  They  extend  the  semantic 
region  merging  process  by  alternating  this  merging  process  with  the  propaga- 
tion of  semantic  constraints  on  the  identity  labels.  For  this  approach 
to  be  automated  it  requires  the  initial  labelling  of  all  elementary  regions 
(even  individual  picture  elements!)  and  the  spei' i f icat ion  of  computationally 
effective  procedures  to  extract  the  semantic  relationships  between  regions. 

However,  the  degree  to  which  one  can  sat  1 star  tor i 1 v label  the  possible 
interpretations  of  a small  section  of  an  object  on  the  basis  of  purely 
local  infocmalion  is  still  uncertain;  with  a large  number  of  possible 
objects  this  problem  may  ho  serious.  Tile  authors  demonstrate  examples 
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with  this  labelling  supplied  manually  or  directed  via  pre-defined  geometric 
models.  The  results  are  quite  interesting,  but  the  extensibility  of  this 
approach  to  automatic  segmentation  of  general  scenes  seems  to  be  quite 
difficult.  Discussion  of  these  problems  is  presented  in  a bit  more  detail 
in  Arbib  and  Rlseman  [1976]. 

When  one  examines  the  effort  and  ingenuity  involved  in  trying  to 
keep  one  region  from  leaking  into  another,  one  can  only  conclude  that 
better  nonsemantic  data  will  be  required  to  guide  segmentation.  In 
particular,  it  suggests  the  need  for  analysis  of  feature  activity  in 
the  context  of  the  area  under  consideration.  For  merging  purposes,  one 
can  only  determine  relative  similarity  of  atomic  areas  in  the  context 
of  the  characteristics  of  atomic  areas  in  the  vicinity.  This  consideration 
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leads  to  the  more  global  feature  analysis  of  the  second  major  approach. 

5 ,J3  Reg  ion  Fo  rmatlon  via  G lol^al  F eat  ure  Analvsls 

This  approach  is  based  on  the  premise  that  the  global  distribution 
of  feature  activity  in  a scene  contains  sufficient  information  for  seg- 
mentation of  major  areas.  If  two  regions  have  a distinct  difference  in 
intensity  one  would  expect  the  intensity  histogram  to  form  major  peaks 
about  their  respective  means.  Figure  10  is  a set  of  one-dimensional 
histograms  for  subarea  B in  Figure  Id  and  in  certain  places  they  have  a 
multimodal  distribution  of  the  type  expected.  One  can  try  to  form  the 
desired  regions  by  separately  turning  on  all  image  points  in  one  or 
another  of  the  clusters.  Automatic  determination  of  clvister  boundaries  based 
on  histogram  peaks  may  be  simple  or  difficult  depending  on  the  particular 
case.  If  one  examines  cluster  2 of  Figure  10a,  and  the  points  that  are 
turned  on  in  the  image  (Figure  11a) , it  appears  that  this  approach  works 
nicely. 

Ohlander  [1975]  developed  a technique  of  recursively  partitioning 
an  image  by  setting  thresholds  at  valleys  of  ID-histograms  of  various  features. 
The  first  partition  will  form  around  the  clearest  peak  in  any  histogram; 
then,  the  associated  points  in  the  image  are  turned  on  and  adjacent  points 
with  the  same  label  can  be  merged  into  a region  by  growing  on  the  symbolic 
labels;  these  regions  are  smoothed  by  blurring,  and  each  of  these  distinct 
regions  will  be  the  basis  for  further  analysis  by  histograms.  A region 
is  kept  intact  only  when  it  is  unimodal  in  all  histograms  employed. 

In  order  for  tills  process  to  work,  Ohlander  subtracts  out  'busy  areas' 
of  texture  and  smaller  detail  by  using  a measure  of  the  amount  of  edge 
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In  each  local  area.  These  areas  are  processed  by  different 
techniques  including  the  blurring  operation  previously  mentioned. 

Despite  the  obvious  effectiveness  of  this  procedure  in  some  cases, 
there  are  several  deficiencies  with  this  type  of  histogram  analysis.  Consider  some 
of  the  other  histogram  clusters  in  Figure  10 — the  peaks  and  cluster 
widths  are  not  so  clear.  A more  serious  problem,  though,  is  that  dif- 
ferent objects  can  have  partially  overlapping  distributions  in  one  or 
all  of  the  features.  This  can  cause  peaks  and  valleys  to  appear  and 
disappear  as  the  particular  combination  of  objects  is  varied,  despite 
the  possibility  that  all  of  the  objects  appear  visually  distinct  to 
the  human  observer.  A companion  difficulty  is  that  one  cannot  expect 
HSI  features  to  produce  clusters  in  many  types  of  texture.  Tlie  blurring 
operations  employed  by  Ohlander  will  not  be  sufficient  to  deal  properly 
with  the  general  characteristics  of  texture. 

These  points  are  emphasized  when  one  examines  the  rest  of  the  HSI 
histograms  fn  Figure  10.  The  two  clusters  in  Figure  10a  produce  a 
reasonable  first  approximation  in  delineating  the  sky  area  and  tree 
area  (Figure  11a).  However,  when  one  maps  the  three  saturation  clusters 
of  Figure  10b  back  onto  the  image  (Figure  11b),  one  finds  the  clusters 
associated  with  sky  (3)  and  roof  (1)  are  interspersed  in  the  tree  with 
the  last  cluster  (2).  It  begins  to  appear  quite  messy.  Use  of  the 
Intensltv  histegram  is  also  poor  in  that  tree  and  roof  arc  in  the  same 
cluster  tegether  (Figure  11c).  Although  the  other  histograms  will 
separate  these  regions  in  the  recursive  splitting  process,  the  formation 
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of  the  tree  as  a distinct  region  will  not  occur  because  the  hue  mapping 
will  be^split  by  the  saturation  mapping. 

One  can  hope  that  the  sequential  determination  of  largest  regions 
can  be  used  to  continually  subtract  away  the  data  which  obscures  the 
presence  of  less  noticeable  peaks.  However,  the  quality  of  this  algorithm 
seems  to  be  subject  to  an  arbitrary  condition,  namely  the  particular  mix 
of  regions  being  examined.  Thi.s  problem  would  probably  be  reduced  if  the 
Image  were  broken  into  smaller  areas;  this  can  be  thought  of  as 
a foveal  window  where  the  system  initially  focusses  in  a directed  manner 
upon  a subarea  of  the  entire  scene  in  far  more  detail. 

Similarly,  the  peaks  would  have  less  chauce  of 

being  obscured  if  multi-dimensional  histograms  were  employed  (although  then 
the  detection  of  peaks  and  clusters  is  less  straightforward) . Figure 
12  depicts  possible  difficulty  in  discriminating  different  intensity  and 
hues  with  one-dimensional  histograms.  One  might  hope  that  other  features 
would  detect  differences  in  these  cases.  Of  course  this  problem  can 
occur  in  2D  histograms  and  require  one  to  go  to  higher  dimensional 
spaces,  but  at  least  with  2D  histograms  pairwise  dependencies 
are  available  and  this  might  be  sufficient.  But  there  is  a still  more 
significant  drawback  that  must  be  overcome;  that  is,  the  lack  of  Information 
on  the  spatial  relationships  of  the  features  being  examined. 

When  a histogram  of  a feature  based  on  individual  points  is  used 
to  form  a region  In  the  manner  described,  spatial  Information  is  employed 
during  this  analysis  only  in  terms  of  adjacencv  of  points  wiilch  have  similar 
labels.  On  the  basts  of  global  histogram  analysis,  one  cannot  determine 
the  difference  between  a red  area  bordering  a yellow  area  and  red  polka 
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(a)  Hue,  (b)  Saturation,  an.l  (c)  Intiiisitv. 


Figure  12:  Potential  problems  with  one-dimensional  liistograms. 

(a)  Assume  four  clusters  of  activity  in  a 2D  histogram  of 
hue  vs.  intensitv;  the  peaks  of  activity  are  assumed  to  be 
at  the  center  of  each  circle  and  topologically  are  on  the 
axis  coming  o\it  of  the  paper.  (b)  A ID  hi.stogram  of  either 
hue  or  intensity  will  obscure  the  clusters. 


(a) 


(b) 


Figure  12 
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dots  within  a yellow  area — they  can  produce  identical  histograms  and  the 
difference  in  structure  is  not  seen.  More  generally  a texture  can  be 
composed  of  a set  of  micro-  or  sub-texture  elements.  Each  micro- texture 
i could  contribute  toward  a macro-texture  with  a given  spatial  distri- 
bution p^.  These  distributions  might  tend  towards  spatial  randomness 
or  they  might  form  more  structured  geometrical  relationships.  The  rela- 
tive occurrence  of  each  micro-texture  type  can  remain  fixed  and  still 
allow  a virtually  unlimited  number  of  different  textures. 

If  many  local  areas  possess  similar  characteristics,  this  is  a cue 
to  texture,  e.g.,  sky  and  foliage,  newsprint  on  paper,  or  a simple  checker- 
board. It  should  thus  be  possible  to  bind  various  types  of  micro-texture 
characteristics  into  a single  macro-textured  region.  If  one  turns  on  either 
the  blue  or  the  green  patches  of  sky-foliage  texture  for  separate  analysis, 
only  a partial  texture  is  obtained  and  other  kinds  of  problems  are  intro- 
^ duced.  It  is  clear  that  the  spatial  relationship  of  these  features  is 

a fundamental  aspect  of  this  texture.  Blurring  to  smooth  these  regions 
r and  make  them  homogeneous  is  an  alternative,  but  this  produces  its  own 

problems  and  does  not  get  at  the  basis  of  texture.  Our  solution  for 
effective  region  growth  calls  for  utilizing  the  strengths  of  both  approaches. 
Integrating  global  feature  activity  with  a local  spatial  region  growing 
process. 

5.4  Integrat  ing  Spatial  Analysis  with  Global  Feature  Analysji^s 

The  scheme  we  present  for  binding  feature  histograms  to  spatial 
relationships  in  the  image  is  described  in  more  detail  by  Hanson,  Riseman, 
and  Nagln  [1975),  and  bears  some  similarity  to  a previous  investigation 
by  Tomlta,  Yachida  and  Tsujl  [1973].  In  this  approach,  histograms  of 
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various  feature  pairs  are  employed  to  find  clusters  of  feature  activity. 

The  system  restricts  its  attention  to  a two-dimensional  feature  space  because  the  analysis 

of  the  histograms  can  be  carried  out  by  operating  on  them  as  pseudo-images 

in  the  processing  cones  described  earlier  (since  the  cone  is  essentially 

a general  2D  array  processor).  The  algorithms  assume  the  existence  of  a 

process  which  can  dynamically  select  relevant  features  and  make  them 

'active'.  The  point  has  already  been  made  that  the  composition  of  the 

relevant  features  varies  with  the  situation  and  with  the  movement  of.  a 

fovea  if  one  exists  in  the  system.  A low-level  system  will  need  a front 

end  for  this  selection  process  and  it  would  have  to  be  interfaced  to  the 

several  other  types  of  algorithms  that  we  have  presented. 

Each  feature  is  computed  as  a function  of  a local  window  of  some 
size  (the  windows  may  be  overlapping  or  non-overlappi.  g) . .As  the  size  of 
this  window  increases,  the  quality  of  the  information  changes  depending 
on  the  situation.  If  the  window  is  centered  entirely  within  a region, 
then  the  average  and  variance  computed  over  the  local  area  becomes  statistically  more 
meaningful  as  the  size  increases.  On  the  ocher  hand,  as  the  size  Increases 
it  becomes  more  and  more  likely  that  the  window  will  overlap  different 
regions  and  a 'mutant'  value  for  the  average  and  variance  will  be  nroduced. 

This  is  the  window  problem  that  we  have  already  mentioned,  and  wo  shall 
see  shortly  that  this  type  of  noise  causes  problems. 

Figures  13  and  14  are  two  simple  examples  of  two-dimensional  histo- 
grams of  subarea  B (roof,  tree,  and  sky)  of  our  example  scene,  in  particular 
hue  vs.  intensity  (H  vs.  I),  and  Intensity  vs.  saturation  variation 
(I  vs.  Sy) • Despite  the  noise  it  is  quite  clear 
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that  several  major  clusters  in  the  different  feature  spaces  are  correlated  with  the 
color  image.  The  sky  is  bright  blue  (cyan)  and  the  roof  a speckled  red;  these  areas 
have  little  variation  in  intensity  and  saturation.  The  tree  area  has 
a hue  of  various  greens,  blue,  and  white;  saturation  and  intensity  var'a- 
tion  is  high  because  the  surface  is  irregular  and  there  is  large  variation 
between  figure  and  background  sky  showing  through. 

Useful  information  appears  in  all  of  the  histograms  and  the  clusters 
in  the  different  histograms  are  certainly  correlated.  A low-level  system 
should  extract  these  dependencies  and  use  the  redundancy  to  increase  the 
confidence  in  the  results  of  this  processing.  Although  some  clusters 
! are  quite  clear  and  easy  to  extract,  others  are  more  amorphous  and  wide- 

spread. The  clarity  and  definition  of  a cluster  may  be  aggravated  by  the 
window  problem.  A window  that  overlaps  two  adjacent  homogeneous  regions 
will  produce  a value  in  between  these  clusters  when  computing  the  mean 
of  a feature,  and  false  activity  in  the  case  of  variation  of  a feature. 

This  can  produce  trails  between  clusters. 

In  order  to  reduce  the  impact  of  the  window  problem  on  histograms, 
the  technique  of  non-maxima  suppression  discussed  in  Section  4.2  can  be 
utilized.  In  this  application  we  have  chosen  to  generalize  it  to  a 
process  of  non-extremum  suppression:  values  which  are  not  local  minima 

or  local  maxima  in  8-adjacency  neighborhoods  will  not  be  allowed  to 
contribute  to  the  histogram.  This  will  have  the  effect  of  suppressing 
much  of  the  trails  between  clusters  associated  with  smooth  regions. 

As  the  window  moves  across  a boundary,  the  values  of  the  means  of  a 
j window  will  be  changing  from  min  to  max  (or  vice  versa),  while  the 

1,  values  of  variations  will  Increase  and  then  decrease  so  that  only  a 


Kigure  11:  b’st-  of  two-dimensional  histograms  to  guide  region  formation 

in  sill)  image  R - Example  1,  Hue  vs.  Intensity. 

(a)  A two-dimensional  histogram  of  hue  vs.  intensity  with  eaih 
character  representing  the  number  of  points  liavinu  the  associated 
values  of  H and  I on  the  respective  axes;  note  that  alpliabetic 
char.icters  represent  values  between  A = 10  and  7.  = 15, 
punctuation  and  special  characters  represent  values  between 
16  and  45,  while  a period  represents  all  values  greater  than 
45.  (b)  Non-extremum  suppression  is  used  to  suppress  image 

points  whose  values  are  not  local  minima  or  local  maxima. 

Analysis  of  both  the  unsuppressed  and  suppressed  histograms 
might  allow  the  delineation  of  the  clusters  which  are  enclosed 
in  rectangles  in  the  two  histograms.  (c)  Projection  of 
clusters  in  the  suppressed  histogram  back  onto  the  image  with 
suppressed  points  appearing  as  blanks.  (d)  Projection  of  the 
iinsiippressed  histogram  back  onto  the  image. 


13(b) 


1 

? 

3 

4 

5 

6 

7 

8 
9 


3 3 5 uiiiiuii  au  l•^a^uuua^aaa^ 

35  5 4 3 UU  44  4444444444444444 

33  5 33  44444444  444444444444 

4 33  5 5444444444444444444 

33  35  4444444444444444 

3 33  5 444444444444444 

33  3 33  354444444444444444 

53  3 4444444444444444 


10 

3 

3 

44 

5 

44444444444444 

11 

3 

3 3 

45 

4444 

4 4 4 44 

1 2 

353 

33  3 

4 

4444 

4444 

13 

3 3 

14 

33 

3 

33 

2 111 

1 1 1 11 

1 5 

33 

333 

33  3 3 

3 

1111  12111  1 

16 

3 

3 3 

33  3 

33 

111  11  1111 

17 

333 

3 3 

3 

33 

2 1 

1 11  111 

18 

3 3 

3 

33  3 

3 

2 1 

1 111  2 11 

1 9 

3 33 

3 3 

3 333 

121 

111  1111 

20 

2 33  3 

3 3 

3 

2 1111  111 

21 

3 

33 

21 

3 22 

21  112  111 

22 

2 

333 

2 

2 

1 2 2 21  1 

23 

33 

2 

3 

1 

3 

11121 

24 

3 

11  2 

3 

3 

1 1 11 

25 

(c) 


47d 


2 3553355335  4 4 4 4 44 444 4 4 4 4 4 4 44 4 4 44 4 4 

3 35  5545333544444444444444444444444 

4 333  5 5533355444444444444444444444 

5 33335  4 33355555444444444444444444 

6 3333333  33355355444444444444444444 

7 33333333333333353 54 444444444444444 

6 33  33  3333333333354444444444444444 

9 3 333  533333335  554444444444444444 

10  3 333333333544455444444444444444 

11  3333333345545444444444444444 

12  353  5 3333333353354444444444444444 

13  3333333333333333335555555555555555 

14  3333333333333333333211111111111111 

15  333333333333333333311111111 1211111 

16  3 33  3333333333  3 33333  31  1111111  mill 

17  33333333333333333  3321111111111111 

18  3333333333333333333  22111111112111 

19  33333333333333  333312111111111111 

20  233  3 33  333  33333  33  33  31  1 1 2 11  11  11  1 11 

21  32333333  333333212  332212111121111 

22  3233  33  33222333331  22333331212121111 

23  33  3 33  33  2 2 3 3 3 3 3 3 31  3 3 3 3 3 11  1 1 1 21  1 1 

24  3 33  3 3 11  1 3 2 3 3 3 3 3 3 3 3 33  31  1 1111111 

25 


(d) 


Figure  13 


4K 


Figure  J4:  Use  of  twti-dimensional  histograms  to  guide  region  formation 

in  subimage  B - Example  2,  Intensity  vs.  Saturation  V'ariance. 
Refer  to  Figure  13  for  more  detailed  explanation.  (a)  A two- 
dimensional  histogram  of  intensity  vs.  saturation  variance 
(Sy)  , where  ,S^  of  each  point  X is  computed  across  a i > 3 
window  centered  on  X.  (b)  2D  histogram  after  non-cxtremun 
suppression  is  applied.  (c)  Projection  of  clusters  in  the 
suppressed  iiistogram  back  onto  image.  (d)  Projection  of  clusters 
in  the  unsuppressed  histogram  (al  back  onto  the  image. 
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subset  of  the  mutant  values  (the  peaks)  will  be  present.  When  forming 
hlstograins  of  the  features  of  individual  points  (e.g.,  HSI) , then  the 
points  which  are  on  a gradient  between  min  and  max  values  will  be 
suppressed:  thus,  in  the  case  of  tree-type  texture,  just  the  highlight 
and  shadow  points  will  contribute.  Figures  13c  and  14c  portray  the 
position  of  non-extremum  points  as  blanks  in  the  subimage.  It  should 
be  noted  that  in  the  case  of  two  featun  , a r»olnt  is  not  suppressed 
if  its  value  is  a min  or  max  in  either  ot  the  two  features.  Figures 
13b  and  14b  portray  the  2D-histograms  under  non-extremum  suppression  and 
it  clearly  enhances  the  clusters  of  Interest.  M though  the  results  are 
not  entirely  predictable  in  non-trivial  areas  ol  an  image,  generally  cluster 
clarity  should  be  improved. 

There  are  many  approaches  to  the  extraction  of  clusters.  A very  simple 
technique  utilized  by  Hanson  et  al.  [1975]  employed  the  cone  structure 
to  extract  clusters.  Clusters  of  activity  are  extracted  by  blurring 
(averaging)  so  that  the  activity  is  smoothed,  and  scaling  so  that 
low  activity  valleys  between  clusters  disappear  with  only  the  peaks 
remaining.  This  corresponds  to  Ohlander's  threshold  setting  process,  and 
with  some  difficulty,  it  can  be  automated.  All  of  these  operations 
(besides  the  formation  of  the  histogram  Itself)  can  easily  be  computed 
as  local  parallel  operations  in  the  processing  cone.  Afterwards,  the 
boundaries  of  each  cluster  can  be  grown  outward  to  capture  the  areas 
blurred  and  scaled  away.  Although  there  are  many  more  sophisticated 
clustering  algorithms  (see  Meisel  [1972])  which  might  produce  higher 
quality  results,  they  may  be  exorbitant  in  computation  or  not  be  as 
intrinsically  parallel  in  their  nature. 


It  is  clear  that  a system  can  use  the  Information  in  the  suppressed 
and  unsuppressed  histograms  to  define  clusters  in  the  unsuppressed  histo- 
grams. We  will  assume  that  the  clusters  outlined  in  the  histograms  of 
Figures  13  and  14  have  been  extracted.  Each  cluster  is  symbolically 
labelled  with  a distinct  numeric  symbol.  The  next  stage  of  processing 
Involves  a feedback  loop  to  correlate  these  features  with  their  spatial 
relationships  in  the  original  image.  Points  in  the  original  image  can 
be  labelled  according  to  the  cluster  to  which  they  belong.  Figures 
13c-d  and  14c-d  represent  the  image  labelled  by  the  clusters  in  both  the 
suppressed  and  unsuppressed  histograms  of  Figures  13  and  14  respectively. 

In  Figure  14d , the  I-Sy  histogram,  it  is  clear  that  two  of  the  clusters 
represent  the  roof  (medium  I,  low  S^)  and  sky  (high  I,  low  S^) . Note 
the  two  unlabelled  rows  in  between  these  areas.  They  are  caused  by 
false  variations  due  to  the  window  placement,  and  are  incorrectly  grouped 
into  clusters  3 and  4 representing  the  tree-sky  area  as  shown  in  Figure 
14d.  Since  our  window  is  3 x 3 there  will  only  be  two  rows  which  have 
thes$  false  values,  but  the  problem  becomes  more  critical  as  the  window 
size  increases.  An  'intelligent*  low-level  system  would  understand  the 
strengths  and  weaknesses  of  each  segmentation  procedure.  Thus,  this 
problem  can  be  expected  and,  at  least  in  this  case,  easily  cleared  up  by 
a little  more  processing. 

Now  it  is  quite  straightforward  to  grow  regions  across  adjacent  points 
with  the  same  label  and  retain  those  regions  which  are  relatively  large. 
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But  areas  of  heavy  texture  might  form  several  clusters  in  the  histograms, 
from  highlights  and  shadows  or  from  sky  and  foliage.  Thus,  the  textured 
area  which  one  wants  to  extract  may  have  local  areas  of  varying  labels, 
with  possibly  many  disconnected  local  areas  of  the  same  label.  This 
effect  is  hinted  at  in  Figures  14c-d  and  is  seen  more  vividly  in  Figure  15 
which  is  a result  of  a histogram  of  Intensity  mean  vs.  intensity  variance 
on  3 * 3 windows. 

At  this  point  we  have  atomic  areas  *which  represeht  either  regions  ’or 
possible  texture  elements  depending  on  the  resolution  and  the  particular 
object.  Certainly  any  large  area  so  formed  which  consists  of  a single 
type  of  activity  will  be  evaluated  as  an  entity  in  itself,  a region. 

If  these  areas  are  small,  however,  they  might  be  considered  microtexture 
elements. 

The  labelled  points  and  areas  provide  the  system  access  to  statistical 
and  structural  spatial  properties  of  the  feature  types.  This  analysis 
can  be  used 'to  guide  region  growth  across  the  symbolic  labels.  In  an 
initial  attempt  to  extract  simple  properties,  the  VISIONS  group  (Hanson, 
Riseman  and  Nagln  [1975])  utilized  an  adjacency  matrix  as  a measure  of 
the  degree  to  which  atomic  areas  of  different  types  are  Interspersed, 
barge  numbers  imply  that  two  texture  types  are  often  adjacent  to  each  other 
and  signal  the  possibility  that  they  form  one  or  more  regions  with  a macro- 
texture of  these  two  (or  more)  types.  By  growing  across  the  labels 
representing  those  two  microtextures,  a single  macrotextured  region  is 
formed.  We  have  depicted  this  case  in  Figure  14e  by' enlarging  the  cluster 
to  cover  both  clusters  3 and  4 In  Figure  14a.  Thus,  we  have  captured 
texture  patterns  In  a single  region  and  it  is  a simple  matter  to  extract 


I 


symbolic  descriptors  of  the  different  microtextures  used  In  the  construc- 
tion of  the  macrotexture.  We  leave  the  analysis  of  the  redundancy  of  these 
different  segmentation  results  for  the  reader's  Inspection, 

A more  sophisticated  analysis  of  the  statistical  and  structural 
characteristics  is  desired.  One  would  like  to  note  the  difference  between 
blue  and  green  vertical  stripes,  and  blue  Irregular  shaped  blobs  amidst 
a green  background.  This  implies  the  utility  of  a hierarchical  feature- 
aelector/texture-analyzer  which  is  the  subject  of  ongoing  research. 
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Figure  15:  Interspersion  of  cluster  labels  In  textured  areas.  These 

projections  of  cluster  labels  are  derived  from  a 2D  histogram 
of  intensity  mean  vs.  intensity  variance  computed  on  3 x 3 
windows.  (a)  Clusters  projected  from  suppressed  histograms, 
(b)  Clusters  projected  from  unsuppressed  histograms. 
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6,  Conrluslon 

Until  recently  individual  efforts  in  computer  vision  have  been 
rather  limited.  This  is  not  intended  to  be  a criticism  of  some  of  the 
fine  work  that  has  been  conducted.  However,  it  does  point  out  that  the 
complexity  of  vision  has  not  been  tested  against  a multi-level  systems 
approach  of  modular  processes  with  effective  means  of  communication  between 
them.  Humans  use  the  high  degree  of  redundancy  available  in  images  in 
order  to  understand  them.  Distant  mountains  have  cues  of  perspective, 
a blue  color  shift,  upper  boundary  shapes,  and  further  semantic  constraints 
which  allow  strong  hypotheses  to  their  identity.  Similarly,  there  is 
a redundancy  of  features  and/or  algorithms  which  can  lead  to  consistent 
segmentations  in  terms  of  regions  and  boundaries.  Animals  seem  to 
exhibit  multiple  representations  at  an  early  level  to  aid  their  goal- 
oriented  visual  perception  (Lettvin,  Matiirana,  McCulloch  and  Pitts  [1959]). 

.This  paper  exhibited ‘ several  algorithms  for  the  extraction  of  boundaries 
or  regions.  Since  a representation  of  either  boundaries  or  regions  implicitly  defines  the 
ocher,  we  have  a means  of  integrating  their  results  in  terms  of  the  ideas  of 
competition  and  cooperation  (Arbib  & Rlseman  [1976]).  Relaxation  and  constralnt- 
^atisfaction  algorithms  may  afford  a general  mechanism  by  which  many  kinds 
")f  Inform-ition  can  be  integrated.  There  are  also  algorithms  for  region 
'rowth  on  labels  determined  by  global  analysis.  Each  algorithm  approaches 
the  data  from  a different  perspective  and  may  be  subject  to  different 
weaknesses.  A system  of  these  routines  could  allow  performance  beyond  the 
capability  of  any  single  algorithm  by  allowing  multiple  and  somewhat  redundant 
representations  to  determine  the  portions  of  the  segmentation  for  which  there 
is  high  confidence.  The  difficulty  with  this  approach  is  that  several  partially 
reliable  segmentation.s  could  produce  a maze  of  inconsistencies  which  are  not  easy 
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to  resolve.  There  is  also  a significant  overhead  in  additional  computation 
on  a serial  machine.  However,  it  seems  quite  reasonable  to  view  these  compu- 
tations taking  place  in  future  on  parallel  hardware  and  in  real  time. 

A low-level  system  for  segmentation  should  have  a front-end  that 
allows  any  subset  of  a pool  of  features  to  be  invoked  for  use  by  the 
different  algorithms.  This  requires  mechanisms  to  select  relevant  features 
entirely  upon  the  basis  of  limited  processing  of  the  specific  image  under 
consideration.  It  is  one  example  of  the  need  for  binding  global  feature 
analysis  with  local  spatial  analysis.  One  might  use  global  histogram 
analysis  to  identify  clusters  of  feature  activity  for  ordering  the  potential 
importance  of  the  features.  Feedback  from  semantic  processes  after  initial 
segmentation  and  interpretation  can  provide  powerful  guidance  to  the  invoca- 
tion of  specific  features. 

We  have  examined  the  difficulties  produced  by  overlapping  feature 
distributions  from  different  parts  of  a scene.  This  confusion  in  the 
analysis  might  be  reduced  by  performing  a coarse  segmentation  by  edge 
analysis  on  a blurred  image  so  that  major  areas  can  be  delimited  and 
processed  independently.  In  our  example,  then,  areas  of  tree  texture  and 
the  straight  lines  within  the  house  might  be  analyzed  by  distinct  algorithms 
and/or  features  which  are  most  suitable  to  each.  At  this  point  in  our 
development  of  computer  vision  systems,  such  a level  of  generality  and 
flexibility  is  extremely  difficult  to  achieve.  However,  it  appears  to  be 
a natural  direction  for  integrating  the  broad  range  of  efforts  underway. 


Some  workers  believe  that  implementation  of  general  computer  vision 
systems  will  not  be  within  our  grasp  for  some  time.  Tliis  paper  has  shown 
that  such  a conclusion  is  not  without  iustif ication.  However,  the 
effectiveness  of  an  Integrated  system  approach  lias  yet  to  be  evaluated, 
ilhile  research  on  general  computer  vision  systems  continue,  they  should 
provide  spin-offs  in  more  constrained  applications.  There  has  already 
been  a focus  upon  ERTS  satellite  Imagery,  bio-medical  applications, 
and  limited  industrial  assembly  line  work  with  promising  results.  It 
is  Important  to  pursue  the  goals  of  both  general  vision  and  limited  goal- 
oriented  vision  systems  during  the  coming  years. 
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segments.  Approaclies  to  region  formation  Include  region  growing  under  local 
spatial  guidance,  histograms  for  analysis  of  global  feature  activity,  and 
finally  an  integration  of  the  strengths  of  each  by  a spatial  analysis  of 
feature  activity.  A brief  discussion  of  attempts  by  others  to  Integrate 
the  segmentation  and  interpretation  phases  is  also  provided.  The  discussion 
is  supported  by  a variety  of  experimental  results. 
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